Ensemble Techniques Project

Problem statement:

Goal is to classify the patients into the respective labels using the attributes from their voice recordings

Exploratory Data Analysis to predict the Patients who have been affected by Parkinsons Disease to accurately diagnosis PD, this would be an effective screening step prior to an appointment with a clinician

In [294]:
# Import Basic packages
import pandas as pd, numpy as np, matplotlib.pyplot as plt, seaborn as sns
from scipy import stats; from scipy.stats import zscore, norm, randint
import matplotlib.style as style; style.use('fivethirtyeight')
import plotly.express as px
%matplotlib inline


# Model Building - LR, KNN, NB,SVC, Decision Tree , Ensemble Models 
##import scikit learn for Model Building
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, f1_score, recall_score, precision_score
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier, BaggingClassifier
from sklearn import model_selection
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold, cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB

from sklearn import svm
from sklearn.ensemble import StackingClassifier
from sklearn import metrics
from sklearn.metrics import classification_report, confusion_matrix,accuracy_score 
#Data Preprocessing to scale the data
from sklearn.preprocessing import RobustScaler, StandardScaler
from sklearn.pipeline import Pipeline



# Suppress warnings
import warnings; warnings.filterwarnings('ignore')

# Visualize Tree
from sklearn.tree import export_graphviz
from IPython.display import Image
from os import system

# Display settings
pd.options.display.max_rows = 10000
pd.options.display.max_columns = 10000

random_state = 42
np.random.seed(random_state)

1. Load the dataset

In [8]:
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data')
data.head()
Out[8]:
name MDVP:Fo(Hz) MDVP:Fhi(Hz) MDVP:Flo(Hz) MDVP:Jitter(%) MDVP:Jitter(Abs) MDVP:RAP MDVP:PPQ Jitter:DDP MDVP:Shimmer MDVP:Shimmer(dB) Shimmer:APQ3 Shimmer:APQ5 MDVP:APQ Shimmer:DDA NHR HNR status RPDE DFA spread1 spread2 D2 PPE
0 phon_R01_S01_1 119.992 157.302 74.997 0.00784 0.00007 0.00370 0.00554 0.01109 0.04374 0.426 0.02182 0.03130 0.02971 0.06545 0.02211 21.033 1 0.414783 0.815285 -4.813031 0.266482 2.301442 0.284654
1 phon_R01_S01_2 122.400 148.650 113.819 0.00968 0.00008 0.00465 0.00696 0.01394 0.06134 0.626 0.03134 0.04518 0.04368 0.09403 0.01929 19.085 1 0.458359 0.819521 -4.075192 0.335590 2.486855 0.368674
2 phon_R01_S01_3 116.682 131.111 111.555 0.01050 0.00009 0.00544 0.00781 0.01633 0.05233 0.482 0.02757 0.03858 0.03590 0.08270 0.01309 20.651 1 0.429895 0.825288 -4.443179 0.311173 2.342259 0.332634
3 phon_R01_S01_4 116.676 137.871 111.366 0.00997 0.00009 0.00502 0.00698 0.01505 0.05492 0.517 0.02924 0.04005 0.03772 0.08771 0.01353 20.644 1 0.434969 0.819235 -4.117501 0.334147 2.405554 0.368975
4 phon_R01_S01_5 116.014 141.781 110.655 0.01284 0.00011 0.00655 0.00908 0.01966 0.06425 0.584 0.03490 0.04825 0.04465 0.10470 0.01767 19.649 1 0.417356 0.823484 -3.747787 0.234513 2.332180 0.410335

2. It is always a good practice to eye-ball raw data to get a feel of the data in terms of number of records, structure of the file, number of attributes,types of attributes and a general idea of likely challenges in the dataset. Mention a few comments in this regard

In [9]:
#Reshaping the Target column = Status
#Drop and re-locate status at the end column in DataFrame
# Create  copy of Dataframe for Data manipulation
pdata = data
target= pdata['status']
pdata.drop(['status'], axis = 1,inplace = True)
pdata['status'] = target
pdata.head()
Out[9]:
name MDVP:Fo(Hz) MDVP:Fhi(Hz) MDVP:Flo(Hz) MDVP:Jitter(%) MDVP:Jitter(Abs) MDVP:RAP MDVP:PPQ Jitter:DDP MDVP:Shimmer MDVP:Shimmer(dB) Shimmer:APQ3 Shimmer:APQ5 MDVP:APQ Shimmer:DDA NHR HNR RPDE DFA spread1 spread2 D2 PPE status
0 phon_R01_S01_1 119.992 157.302 74.997 0.00784 0.00007 0.00370 0.00554 0.01109 0.04374 0.426 0.02182 0.03130 0.02971 0.06545 0.02211 21.033 0.414783 0.815285 -4.813031 0.266482 2.301442 0.284654 1
1 phon_R01_S01_2 122.400 148.650 113.819 0.00968 0.00008 0.00465 0.00696 0.01394 0.06134 0.626 0.03134 0.04518 0.04368 0.09403 0.01929 19.085 0.458359 0.819521 -4.075192 0.335590 2.486855 0.368674 1
2 phon_R01_S01_3 116.682 131.111 111.555 0.01050 0.00009 0.00544 0.00781 0.01633 0.05233 0.482 0.02757 0.03858 0.03590 0.08270 0.01309 20.651 0.429895 0.825288 -4.443179 0.311173 2.342259 0.332634 1
3 phon_R01_S01_4 116.676 137.871 111.366 0.00997 0.00009 0.00502 0.00698 0.01505 0.05492 0.517 0.02924 0.04005 0.03772 0.08771 0.01353 20.644 0.434969 0.819235 -4.117501 0.334147 2.405554 0.368975 1
4 phon_R01_S01_5 116.014 141.781 110.655 0.01284 0.00011 0.00655 0.00908 0.01966 0.06425 0.584 0.03490 0.04825 0.04465 0.10470 0.01767 19.649 0.417356 0.823484 -3.747787 0.234513 2.332180 0.410335 1
In [10]:
#Open and read the File
pd_data = open("Data - Parkinsons", "r")
print(pd_data.read())  
name,MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP,MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA,NHR,HNR,status,RPDE,DFA,spread1,spread2,D2,PPE
phon_R01_S01_1,119.99200,157.30200,74.99700,0.00784,0.00007,0.00370,0.00554,0.01109,0.04374,0.42600,0.02182,0.03130,0.02971,0.06545,0.02211,21.03300,1,0.414783,0.815285,-4.813031,0.266482,2.301442,0.284654
phon_R01_S01_2,122.40000,148.65000,113.81900,0.00968,0.00008,0.00465,0.00696,0.01394,0.06134,0.62600,0.03134,0.04518,0.04368,0.09403,0.01929,19.08500,1,0.458359,0.819521,-4.075192,0.335590,2.486855,0.368674
phon_R01_S01_3,116.68200,131.11100,111.55500,0.01050,0.00009,0.00544,0.00781,0.01633,0.05233,0.48200,0.02757,0.03858,0.03590,0.08270,0.01309,20.65100,1,0.429895,0.825288,-4.443179,0.311173,2.342259,0.332634
phon_R01_S01_4,116.67600,137.87100,111.36600,0.00997,0.00009,0.00502,0.00698,0.01505,0.05492,0.51700,0.02924,0.04005,0.03772,0.08771,0.01353,20.64400,1,0.434969,0.819235,-4.117501,0.334147,2.405554,0.368975
phon_R01_S01_5,116.01400,141.78100,110.65500,0.01284,0.00011,0.00655,0.00908,0.01966,0.06425,0.58400,0.03490,0.04825,0.04465,0.10470,0.01767,19.64900,1,0.417356,0.823484,-3.747787,0.234513,2.332180,0.410335
phon_R01_S01_6,120.55200,131.16200,113.78700,0.00968,0.00008,0.00463,0.00750,0.01388,0.04701,0.45600,0.02328,0.03526,0.03243,0.06985,0.01222,21.37800,1,0.415564,0.825069,-4.242867,0.299111,2.187560,0.357775
phon_R01_S02_1,120.26700,137.24400,114.82000,0.00333,0.00003,0.00155,0.00202,0.00466,0.01608,0.14000,0.00779,0.00937,0.01351,0.02337,0.00607,24.88600,1,0.596040,0.764112,-5.634322,0.257682,1.854785,0.211756
phon_R01_S02_2,107.33200,113.84000,104.31500,0.00290,0.00003,0.00144,0.00182,0.00431,0.01567,0.13400,0.00829,0.00946,0.01256,0.02487,0.00344,26.89200,1,0.637420,0.763262,-6.167603,0.183721,2.064693,0.163755
phon_R01_S02_3,95.73000,132.06800,91.75400,0.00551,0.00006,0.00293,0.00332,0.00880,0.02093,0.19100,0.01073,0.01277,0.01717,0.03218,0.01070,21.81200,1,0.615551,0.773587,-5.498678,0.327769,2.322511,0.231571
phon_R01_S02_4,95.05600,120.10300,91.22600,0.00532,0.00006,0.00268,0.00332,0.00803,0.02838,0.25500,0.01441,0.01725,0.02444,0.04324,0.01022,21.86200,1,0.547037,0.798463,-5.011879,0.325996,2.432792,0.271362
phon_R01_S02_5,88.33300,112.24000,84.07200,0.00505,0.00006,0.00254,0.00330,0.00763,0.02143,0.19700,0.01079,0.01342,0.01892,0.03237,0.01166,21.11800,1,0.611137,0.776156,-5.249770,0.391002,2.407313,0.249740
phon_R01_S02_6,91.90400,115.87100,86.29200,0.00540,0.00006,0.00281,0.00336,0.00844,0.02752,0.24900,0.01424,0.01641,0.02214,0.04272,0.01141,21.41400,1,0.583390,0.792520,-4.960234,0.363566,2.642476,0.275931
phon_R01_S04_1,136.92600,159.86600,131.27600,0.00293,0.00002,0.00118,0.00153,0.00355,0.01259,0.11200,0.00656,0.00717,0.01140,0.01968,0.00581,25.70300,1,0.460600,0.646846,-6.547148,0.152813,2.041277,0.138512
phon_R01_S04_2,139.17300,179.13900,76.55600,0.00390,0.00003,0.00165,0.00208,0.00496,0.01642,0.15400,0.00728,0.00932,0.01797,0.02184,0.01041,24.88900,1,0.430166,0.665833,-5.660217,0.254989,2.519422,0.199889
phon_R01_S04_3,152.84500,163.30500,75.83600,0.00294,0.00002,0.00121,0.00149,0.00364,0.01828,0.15800,0.01064,0.00972,0.01246,0.03191,0.00609,24.92200,1,0.474791,0.654027,-6.105098,0.203653,2.125618,0.170100
phon_R01_S04_4,142.16700,217.45500,83.15900,0.00369,0.00003,0.00157,0.00203,0.00471,0.01503,0.12600,0.00772,0.00888,0.01359,0.02316,0.00839,25.17500,1,0.565924,0.658245,-5.340115,0.210185,2.205546,0.234589
phon_R01_S04_5,144.18800,349.25900,82.76400,0.00544,0.00004,0.00211,0.00292,0.00632,0.02047,0.19200,0.00969,0.01200,0.02074,0.02908,0.01859,22.33300,1,0.567380,0.644692,-5.440040,0.239764,2.264501,0.218164
phon_R01_S04_6,168.77800,232.18100,75.60300,0.00718,0.00004,0.00284,0.00387,0.00853,0.03327,0.34800,0.01441,0.01893,0.03430,0.04322,0.02919,20.37600,1,0.631099,0.605417,-2.931070,0.434326,3.007463,0.430788
phon_R01_S05_1,153.04600,175.82900,68.62300,0.00742,0.00005,0.00364,0.00432,0.01092,0.05517,0.54200,0.02471,0.03572,0.05767,0.07413,0.03160,17.28000,1,0.665318,0.719467,-3.949079,0.357870,3.109010,0.377429
phon_R01_S05_2,156.40500,189.39800,142.82200,0.00768,0.00005,0.00372,0.00399,0.01116,0.03995,0.34800,0.01721,0.02374,0.04310,0.05164,0.03365,17.15300,1,0.649554,0.686080,-4.554466,0.340176,2.856676,0.322111
phon_R01_S05_3,153.84800,165.73800,65.78200,0.00840,0.00005,0.00428,0.00450,0.01285,0.03810,0.32800,0.01667,0.02383,0.04055,0.05000,0.03871,17.53600,1,0.660125,0.704087,-4.095442,0.262564,2.739710,0.365391
phon_R01_S05_4,153.88000,172.86000,78.12800,0.00480,0.00003,0.00232,0.00267,0.00696,0.04137,0.37000,0.02021,0.02591,0.04525,0.06062,0.01849,19.49300,1,0.629017,0.698951,-5.186960,0.237622,2.557536,0.259765
phon_R01_S05_5,167.93000,193.22100,79.06800,0.00442,0.00003,0.00220,0.00247,0.00661,0.04351,0.37700,0.02228,0.02540,0.04246,0.06685,0.01280,22.46800,1,0.619060,0.679834,-4.330956,0.262384,2.916777,0.285695
phon_R01_S05_6,173.91700,192.73500,86.18000,0.00476,0.00003,0.00221,0.00258,0.00663,0.04192,0.36400,0.02187,0.02470,0.03772,0.06562,0.01840,20.42200,1,0.537264,0.686894,-5.248776,0.210279,2.547508,0.253556
phon_R01_S06_1,163.65600,200.84100,76.77900,0.00742,0.00005,0.00380,0.00390,0.01140,0.01659,0.16400,0.00738,0.00948,0.01497,0.02214,0.01778,23.83100,1,0.397937,0.732479,-5.557447,0.220890,2.692176,0.215961
phon_R01_S06_2,104.40000,206.00200,77.96800,0.00633,0.00006,0.00316,0.00375,0.00948,0.03767,0.38100,0.01732,0.02245,0.03780,0.05197,0.02887,22.06600,1,0.522746,0.737948,-5.571843,0.236853,2.846369,0.219514
phon_R01_S06_3,171.04100,208.31300,75.50100,0.00455,0.00003,0.00250,0.00234,0.00750,0.01966,0.18600,0.00889,0.01169,0.01872,0.02666,0.01095,25.90800,1,0.418622,0.720916,-6.183590,0.226278,2.589702,0.147403
phon_R01_S06_4,146.84500,208.70100,81.73700,0.00496,0.00003,0.00250,0.00275,0.00749,0.01919,0.19800,0.00883,0.01144,0.01826,0.02650,0.01328,25.11900,1,0.358773,0.726652,-6.271690,0.196102,2.314209,0.162999
phon_R01_S06_5,155.35800,227.38300,80.05500,0.00310,0.00002,0.00159,0.00176,0.00476,0.01718,0.16100,0.00769,0.01012,0.01661,0.02307,0.00677,25.97000,1,0.470478,0.676258,-7.120925,0.279789,2.241742,0.108514
phon_R01_S06_6,162.56800,198.34600,77.63000,0.00502,0.00003,0.00280,0.00253,0.00841,0.01791,0.16800,0.00793,0.01057,0.01799,0.02380,0.01170,25.67800,1,0.427785,0.723797,-6.635729,0.209866,1.957961,0.135242
phon_R01_S07_1,197.07600,206.89600,192.05500,0.00289,0.00001,0.00166,0.00168,0.00498,0.01098,0.09700,0.00563,0.00680,0.00802,0.01689,0.00339,26.77500,0,0.422229,0.741367,-7.348300,0.177551,1.743867,0.085569
phon_R01_S07_2,199.22800,209.51200,192.09100,0.00241,0.00001,0.00134,0.00138,0.00402,0.01015,0.08900,0.00504,0.00641,0.00762,0.01513,0.00167,30.94000,0,0.432439,0.742055,-7.682587,0.173319,2.103106,0.068501
phon_R01_S07_3,198.38300,215.20300,193.10400,0.00212,0.00001,0.00113,0.00135,0.00339,0.01263,0.11100,0.00640,0.00825,0.00951,0.01919,0.00119,30.77500,0,0.465946,0.738703,-7.067931,0.175181,1.512275,0.096320
phon_R01_S07_4,202.26600,211.60400,197.07900,0.00180,0.000009,0.00093,0.00107,0.00278,0.00954,0.08500,0.00469,0.00606,0.00719,0.01407,0.00072,32.68400,0,0.368535,0.742133,-7.695734,0.178540,1.544609,0.056141
phon_R01_S07_5,203.18400,211.52600,196.16000,0.00178,0.000009,0.00094,0.00106,0.00283,0.00958,0.08500,0.00468,0.00610,0.00726,0.01403,0.00065,33.04700,0,0.340068,0.741899,-7.964984,0.163519,1.423287,0.044539
phon_R01_S07_6,201.46400,210.56500,195.70800,0.00198,0.000010,0.00105,0.00115,0.00314,0.01194,0.10700,0.00586,0.00760,0.00957,0.01758,0.00135,31.73200,0,0.344252,0.742737,-7.777685,0.170183,2.447064,0.057610
phon_R01_S08_1,177.87600,192.92100,168.01300,0.00411,0.00002,0.00233,0.00241,0.00700,0.02126,0.18900,0.01154,0.01347,0.01612,0.03463,0.00586,23.21600,1,0.360148,0.778834,-6.149653,0.218037,2.477082,0.165827
phon_R01_S08_2,176.17000,185.60400,163.56400,0.00369,0.00002,0.00205,0.00218,0.00616,0.01851,0.16800,0.00938,0.01160,0.01491,0.02814,0.00340,24.95100,1,0.341435,0.783626,-6.006414,0.196371,2.536527,0.173218
phon_R01_S08_3,180.19800,201.24900,175.45600,0.00284,0.00002,0.00153,0.00166,0.00459,0.01444,0.13100,0.00726,0.00885,0.01190,0.02177,0.00231,26.73800,1,0.403884,0.766209,-6.452058,0.212294,2.269398,0.141929
phon_R01_S08_4,187.73300,202.32400,173.01500,0.00316,0.00002,0.00168,0.00182,0.00504,0.01663,0.15100,0.00829,0.01003,0.01366,0.02488,0.00265,26.31000,1,0.396793,0.758324,-6.006647,0.266892,2.382544,0.160691
phon_R01_S08_5,186.16300,197.72400,177.58400,0.00298,0.00002,0.00165,0.00175,0.00496,0.01495,0.13500,0.00774,0.00941,0.01233,0.02321,0.00231,26.82200,1,0.326480,0.765623,-6.647379,0.201095,2.374073,0.130554
phon_R01_S08_6,184.05500,196.53700,166.97700,0.00258,0.00001,0.00134,0.00147,0.00403,0.01463,0.13200,0.00742,0.00901,0.01234,0.02226,0.00257,26.45300,1,0.306443,0.759203,-7.044105,0.063412,2.361532,0.115730
phon_R01_S10_1,237.22600,247.32600,225.22700,0.00298,0.00001,0.00169,0.00182,0.00507,0.01752,0.16400,0.01035,0.01024,0.01133,0.03104,0.00740,22.73600,0,0.305062,0.654172,-7.310550,0.098648,2.416838,0.095032
phon_R01_S10_2,241.40400,248.83400,232.48300,0.00281,0.00001,0.00157,0.00173,0.00470,0.01760,0.15400,0.01006,0.01038,0.01251,0.03017,0.00675,23.14500,0,0.457702,0.634267,-6.793547,0.158266,2.256699,0.117399
phon_R01_S10_3,243.43900,250.91200,232.43500,0.00210,0.000009,0.00109,0.00137,0.00327,0.01419,0.12600,0.00777,0.00898,0.01033,0.02330,0.00454,25.36800,0,0.438296,0.635285,-7.057869,0.091608,2.330716,0.091470
phon_R01_S10_4,242.85200,255.03400,227.91100,0.00225,0.000009,0.00117,0.00139,0.00350,0.01494,0.13400,0.00847,0.00879,0.01014,0.02542,0.00476,25.03200,0,0.431285,0.638928,-6.995820,0.102083,2.365800,0.102706
phon_R01_S10_5,245.51000,262.09000,231.84800,0.00235,0.000010,0.00127,0.00148,0.00380,0.01608,0.14100,0.00906,0.00977,0.01149,0.02719,0.00476,24.60200,0,0.467489,0.631653,-7.156076,0.127642,2.392122,0.097336
phon_R01_S10_6,252.45500,261.48700,182.78600,0.00185,0.000007,0.00092,0.00113,0.00276,0.01152,0.10300,0.00614,0.00730,0.00860,0.01841,0.00432,26.80500,0,0.610367,0.635204,-7.319510,0.200873,2.028612,0.086398
phon_R01_S13_1,122.18800,128.61100,115.76500,0.00524,0.00004,0.00169,0.00203,0.00507,0.01613,0.14300,0.00855,0.00776,0.01433,0.02566,0.00839,23.16200,0,0.579597,0.733659,-6.439398,0.266392,2.079922,0.133867
phon_R01_S13_2,122.96400,130.04900,114.67600,0.00428,0.00003,0.00124,0.00155,0.00373,0.01681,0.15400,0.00930,0.00802,0.01400,0.02789,0.00462,24.97100,0,0.538688,0.754073,-6.482096,0.264967,2.054419,0.128872
phon_R01_S13_3,124.44500,135.06900,117.49500,0.00431,0.00003,0.00141,0.00167,0.00422,0.02184,0.19700,0.01241,0.01024,0.01685,0.03724,0.00479,25.13500,0,0.553134,0.775933,-6.650471,0.254498,1.840198,0.103561
phon_R01_S13_4,126.34400,134.23100,112.77300,0.00448,0.00004,0.00131,0.00169,0.00393,0.02033,0.18500,0.01143,0.00959,0.01614,0.03429,0.00474,25.03000,0,0.507504,0.760361,-6.689151,0.291954,2.431854,0.105993
phon_R01_S13_5,128.00100,138.05200,122.08000,0.00436,0.00003,0.00137,0.00166,0.00411,0.02297,0.21000,0.01323,0.01072,0.01677,0.03969,0.00481,24.69200,0,0.459766,0.766204,-7.072419,0.220434,1.972297,0.119308
phon_R01_S13_6,129.33600,139.86700,118.60400,0.00490,0.00004,0.00165,0.00183,0.00495,0.02498,0.22800,0.01396,0.01219,0.01947,0.04188,0.00484,25.42900,0,0.420383,0.785714,-6.836811,0.269866,2.223719,0.147491
phon_R01_S16_1,108.80700,134.65600,102.87400,0.00761,0.00007,0.00349,0.00486,0.01046,0.02719,0.25500,0.01483,0.01609,0.02067,0.04450,0.01036,21.02800,1,0.536009,0.819032,-4.649573,0.205558,1.986899,0.316700
phon_R01_S16_2,109.86000,126.35800,104.43700,0.00874,0.00008,0.00398,0.00539,0.01193,0.03209,0.30700,0.01789,0.01992,0.02454,0.05368,0.01180,20.76700,1,0.558586,0.811843,-4.333543,0.221727,2.014606,0.344834
phon_R01_S16_3,110.41700,131.06700,103.37000,0.00784,0.00007,0.00352,0.00514,0.01056,0.03715,0.33400,0.02032,0.02302,0.02802,0.06097,0.00969,21.42200,1,0.541781,0.821364,-4.438453,0.238298,1.922940,0.335041
phon_R01_S16_4,117.27400,129.91600,110.40200,0.00752,0.00006,0.00299,0.00469,0.00898,0.02293,0.22100,0.01189,0.01459,0.01948,0.03568,0.00681,22.81700,1,0.530529,0.817756,-4.608260,0.290024,2.021591,0.314464
phon_R01_S16_5,116.87900,131.89700,108.15300,0.00788,0.00007,0.00334,0.00493,0.01003,0.02645,0.26500,0.01394,0.01625,0.02137,0.04183,0.00786,22.60300,1,0.540049,0.813432,-4.476755,0.262633,1.827012,0.326197
phon_R01_S16_6,114.84700,271.31400,104.68000,0.00867,0.00008,0.00373,0.00520,0.01120,0.03225,0.35000,0.01805,0.01974,0.02519,0.05414,0.01143,21.66000,1,0.547975,0.817396,-4.609161,0.221711,1.831691,0.316395
phon_R01_S17_1,209.14400,237.49400,109.37900,0.00282,0.00001,0.00147,0.00152,0.00442,0.01861,0.17000,0.00975,0.01258,0.01382,0.02925,0.00871,25.55400,0,0.341788,0.678874,-7.040508,0.066994,2.460791,0.101516
phon_R01_S17_2,223.36500,238.98700,98.66400,0.00264,0.00001,0.00154,0.00151,0.00461,0.01906,0.16500,0.01013,0.01296,0.01340,0.03039,0.00301,26.13800,0,0.447979,0.686264,-7.293801,0.086372,2.321560,0.098555
phon_R01_S17_3,222.23600,231.34500,205.49500,0.00266,0.00001,0.00152,0.00144,0.00457,0.01643,0.14500,0.00867,0.01108,0.01200,0.02602,0.00340,25.85600,0,0.364867,0.694399,-6.966321,0.095882,2.278687,0.103224
phon_R01_S17_4,228.83200,234.61900,223.63400,0.00296,0.00001,0.00175,0.00155,0.00526,0.01644,0.14500,0.00882,0.01075,0.01179,0.02647,0.00351,25.96400,0,0.256570,0.683296,-7.245620,0.018689,2.498224,0.093534
phon_R01_S17_5,229.40100,252.22100,221.15600,0.00205,0.000009,0.00114,0.00113,0.00342,0.01457,0.12900,0.00769,0.00957,0.01016,0.02308,0.00300,26.41500,0,0.276850,0.673636,-7.496264,0.056844,2.003032,0.073581
phon_R01_S17_6,228.96900,239.54100,113.20100,0.00238,0.00001,0.00136,0.00140,0.00408,0.01745,0.15400,0.00942,0.01160,0.01234,0.02827,0.00420,24.54700,0,0.305429,0.681811,-7.314237,0.006274,2.118596,0.091546
phon_R01_S18_1,140.34100,159.77400,67.02100,0.00817,0.00006,0.00430,0.00440,0.01289,0.03198,0.31300,0.01830,0.01810,0.02428,0.05490,0.02183,19.56000,1,0.460139,0.720908,-5.409423,0.226850,2.359973,0.226156
phon_R01_S18_2,136.96900,166.60700,66.00400,0.00923,0.00007,0.00507,0.00463,0.01520,0.03111,0.30800,0.01638,0.01759,0.02603,0.04914,0.02659,19.97900,1,0.498133,0.729067,-5.324574,0.205660,2.291558,0.226247
phon_R01_S18_3,143.53300,162.21500,65.80900,0.01101,0.00008,0.00647,0.00467,0.01941,0.05384,0.47800,0.03152,0.02422,0.03392,0.09455,0.04882,20.33800,1,0.513237,0.731444,-5.869750,0.151814,2.118496,0.185580
phon_R01_S18_4,148.09000,162.82400,67.34300,0.00762,0.00005,0.00467,0.00354,0.01400,0.05428,0.49700,0.03357,0.02494,0.03635,0.10070,0.02431,21.71800,1,0.487407,0.727313,-6.261141,0.120956,2.137075,0.141958
phon_R01_S18_5,142.72900,162.40800,65.47600,0.00831,0.00006,0.00469,0.00419,0.01407,0.03485,0.36500,0.01868,0.01906,0.02949,0.05605,0.02599,20.26400,1,0.489345,0.730387,-5.720868,0.158830,2.277927,0.180828
phon_R01_S18_6,136.35800,176.59500,65.75000,0.00971,0.00007,0.00534,0.00478,0.01601,0.04978,0.48300,0.02749,0.02466,0.03736,0.08247,0.03361,18.57000,1,0.543299,0.733232,-5.207985,0.224852,2.642276,0.242981
phon_R01_S19_1,120.08000,139.71000,111.20800,0.00405,0.00003,0.00180,0.00220,0.00540,0.01706,0.15200,0.00974,0.00925,0.01345,0.02921,0.00442,25.74200,1,0.495954,0.762959,-5.791820,0.329066,2.205024,0.188180
phon_R01_S19_2,112.01400,588.51800,107.02400,0.00533,0.00005,0.00268,0.00329,0.00805,0.02448,0.22600,0.01373,0.01375,0.01956,0.04120,0.00623,24.17800,1,0.509127,0.789532,-5.389129,0.306636,1.928708,0.225461
phon_R01_S19_3,110.79300,128.10100,107.31600,0.00494,0.00004,0.00260,0.00283,0.00780,0.02442,0.21600,0.01432,0.01325,0.01831,0.04295,0.00479,25.43800,1,0.437031,0.815908,-5.313360,0.201861,2.225815,0.244512
phon_R01_S19_4,110.70700,122.61100,105.00700,0.00516,0.00005,0.00277,0.00289,0.00831,0.02215,0.20600,0.01284,0.01219,0.01715,0.03851,0.00472,25.19700,1,0.463514,0.807217,-5.477592,0.315074,1.862092,0.228624
phon_R01_S19_5,112.87600,148.82600,106.98100,0.00500,0.00004,0.00270,0.00289,0.00810,0.03999,0.35000,0.02413,0.02231,0.02704,0.07238,0.00905,23.37000,1,0.489538,0.789977,-5.775966,0.341169,2.007923,0.193918
phon_R01_S19_6,110.56800,125.39400,106.82100,0.00462,0.00004,0.00226,0.00280,0.00677,0.02199,0.19700,0.01284,0.01199,0.01636,0.03852,0.00420,25.82000,1,0.429484,0.816340,-5.391029,0.250572,1.777901,0.232744
phon_R01_S20_1,95.38500,102.14500,90.26400,0.00608,0.00006,0.00331,0.00332,0.00994,0.03202,0.26300,0.01803,0.01886,0.02455,0.05408,0.01062,21.87500,1,0.644954,0.779612,-5.115212,0.249494,2.017753,0.260015
phon_R01_S20_2,100.77000,115.69700,85.54500,0.01038,0.00010,0.00622,0.00576,0.01865,0.03121,0.36100,0.01773,0.01783,0.02139,0.05320,0.02220,19.20000,1,0.594387,0.790117,-4.913885,0.265699,2.398422,0.277948
phon_R01_S20_3,96.10600,108.66400,84.51000,0.00694,0.00007,0.00389,0.00415,0.01168,0.04024,0.36400,0.02266,0.02451,0.02876,0.06799,0.01823,19.05500,1,0.544805,0.770466,-4.441519,0.155097,2.645959,0.327978
phon_R01_S20_4,95.60500,107.71500,87.54900,0.00702,0.00007,0.00428,0.00371,0.01283,0.03156,0.29600,0.01792,0.01841,0.02190,0.05377,0.01825,19.65900,1,0.576084,0.778747,-5.132032,0.210458,2.232576,0.260633
phon_R01_S20_5,100.96000,110.01900,95.62800,0.00606,0.00006,0.00351,0.00348,0.01053,0.02427,0.21600,0.01371,0.01421,0.01751,0.04114,0.01237,20.53600,1,0.554610,0.787896,-5.022288,0.146948,2.428306,0.264666
phon_R01_S20_6,98.80400,102.30500,87.80400,0.00432,0.00004,0.00247,0.00258,0.00742,0.02223,0.20200,0.01277,0.01343,0.01552,0.03831,0.00882,22.24400,1,0.576644,0.772416,-6.025367,0.078202,2.053601,0.177275
phon_R01_S21_1,176.85800,205.56000,75.34400,0.00747,0.00004,0.00418,0.00420,0.01254,0.04795,0.43500,0.02679,0.03022,0.03510,0.08037,0.05470,13.89300,1,0.556494,0.729586,-5.288912,0.343073,3.099301,0.242119
phon_R01_S21_2,180.97800,200.12500,155.49500,0.00406,0.00002,0.00220,0.00244,0.00659,0.03852,0.33100,0.02107,0.02493,0.02877,0.06321,0.02782,16.17600,1,0.583574,0.727747,-5.657899,0.315903,3.098256,0.200423
phon_R01_S21_3,178.22200,202.45000,141.04700,0.00321,0.00002,0.00163,0.00194,0.00488,0.03759,0.32700,0.02073,0.02415,0.02784,0.06219,0.03151,15.92400,1,0.598714,0.712199,-6.366916,0.335753,2.654271,0.144614
phon_R01_S21_4,176.28100,227.38100,125.61000,0.00520,0.00003,0.00287,0.00312,0.00862,0.06511,0.58000,0.03671,0.04159,0.04683,0.11012,0.04824,13.92200,1,0.602874,0.740837,-5.515071,0.299549,3.136550,0.220968
phon_R01_S21_5,173.89800,211.35000,74.67700,0.00448,0.00003,0.00237,0.00254,0.00710,0.06727,0.65000,0.03788,0.04254,0.04802,0.11363,0.04214,14.73900,1,0.599371,0.743937,-5.783272,0.299793,3.007096,0.194052
phon_R01_S21_6,179.71100,225.93000,144.87800,0.00709,0.00004,0.00391,0.00419,0.01172,0.04313,0.44200,0.02297,0.02768,0.03455,0.06892,0.07223,11.86600,1,0.590951,0.745526,-4.379411,0.375531,3.671155,0.332086
phon_R01_S21_7,166.60500,206.00800,78.03200,0.00742,0.00004,0.00387,0.00453,0.01161,0.06640,0.63400,0.03650,0.04282,0.05114,0.10949,0.08725,11.74400,1,0.653410,0.733165,-4.508984,0.389232,3.317586,0.301952
phon_R01_S22_1,151.95500,163.33500,147.22600,0.00419,0.00003,0.00224,0.00227,0.00672,0.07959,0.77200,0.04421,0.04962,0.05690,0.13262,0.01658,19.66400,1,0.501037,0.714360,-6.411497,0.207156,2.344876,0.134120
phon_R01_S22_2,148.27200,164.98900,142.29900,0.00459,0.00003,0.00250,0.00256,0.00750,0.04190,0.38300,0.02383,0.02521,0.03051,0.07150,0.01914,18.78000,1,0.454444,0.734504,-5.952058,0.087840,2.344336,0.186489
phon_R01_S22_3,152.12500,161.46900,76.59600,0.00382,0.00003,0.00191,0.00226,0.00574,0.05925,0.63700,0.03341,0.03794,0.04398,0.10024,0.01211,20.96900,1,0.447456,0.697790,-6.152551,0.173520,2.080121,0.160809
phon_R01_S22_4,157.82100,172.97500,68.40100,0.00358,0.00002,0.00196,0.00196,0.00587,0.03716,0.30700,0.02062,0.02321,0.02764,0.06185,0.00850,22.21900,1,0.502380,0.712170,-6.251425,0.188056,2.143851,0.160812
phon_R01_S22_5,157.44700,163.26700,149.60500,0.00369,0.00002,0.00201,0.00197,0.00602,0.03272,0.28300,0.01813,0.01909,0.02571,0.05439,0.01018,21.69300,1,0.447285,0.705658,-6.247076,0.180528,2.344348,0.164916
phon_R01_S22_6,159.11600,168.91300,144.81100,0.00342,0.00002,0.00178,0.00184,0.00535,0.03381,0.30700,0.01806,0.02024,0.02809,0.05417,0.00852,22.66300,1,0.366329,0.693429,-6.417440,0.194627,2.473239,0.151709
phon_R01_S24_1,125.03600,143.94600,116.18700,0.01280,0.00010,0.00743,0.00623,0.02228,0.03886,0.34200,0.02135,0.02174,0.03088,0.06406,0.08151,15.33800,1,0.629574,0.714485,-4.020042,0.265315,2.671825,0.340623
phon_R01_S24_2,125.79100,140.55700,96.20600,0.01378,0.00011,0.00826,0.00655,0.02478,0.04689,0.42200,0.02542,0.02630,0.03908,0.07625,0.10323,15.43300,1,0.571010,0.690892,-5.159169,0.202146,2.441612,0.260375
phon_R01_S24_3,126.51200,141.75600,99.77000,0.01936,0.00015,0.01159,0.00990,0.03476,0.06734,0.65900,0.03611,0.03963,0.05783,0.10833,0.16744,12.43500,1,0.638545,0.674953,-3.760348,0.242861,2.634633,0.378483
phon_R01_S24_4,125.64100,141.06800,116.34600,0.03316,0.00026,0.02144,0.01522,0.06433,0.09178,0.89100,0.05358,0.04791,0.06196,0.16074,0.31482,8.86700,1,0.671299,0.656846,-3.700544,0.260481,2.991063,0.370961
phon_R01_S24_5,128.45100,150.44900,75.63200,0.01551,0.00012,0.00905,0.00909,0.02716,0.06170,0.58400,0.03223,0.03672,0.05174,0.09669,0.11843,15.06000,1,0.639808,0.643327,-4.202730,0.310163,2.638279,0.356881
phon_R01_S24_6,139.22400,586.56700,66.15700,0.03011,0.00022,0.01854,0.01628,0.05563,0.09419,0.93000,0.05551,0.05005,0.06023,0.16654,0.25930,10.48900,1,0.596362,0.641418,-3.269487,0.270641,2.690917,0.444774
phon_R01_S25_1,150.25800,154.60900,75.34900,0.00248,0.00002,0.00105,0.00136,0.00315,0.01131,0.10700,0.00522,0.00659,0.01009,0.01567,0.00495,26.75900,1,0.296888,0.722356,-6.878393,0.089267,2.004055,0.113942
phon_R01_S25_2,154.00300,160.26700,128.62100,0.00183,0.00001,0.00076,0.00100,0.00229,0.01030,0.09400,0.00469,0.00582,0.00871,0.01406,0.00243,28.40900,1,0.263654,0.691483,-7.111576,0.144780,2.065477,0.093193
phon_R01_S25_3,149.68900,160.36800,133.60800,0.00257,0.00002,0.00116,0.00134,0.00349,0.01346,0.12600,0.00660,0.00818,0.01059,0.01979,0.00578,27.42100,1,0.365488,0.719974,-6.997403,0.210279,1.994387,0.112878
phon_R01_S25_4,155.07800,163.73600,144.14800,0.00168,0.00001,0.00068,0.00092,0.00204,0.01064,0.09700,0.00522,0.00632,0.00928,0.01567,0.00233,29.74600,1,0.334171,0.677930,-6.981201,0.184550,2.129924,0.106802
phon_R01_S25_5,151.88400,157.76500,133.75100,0.00258,0.00002,0.00115,0.00122,0.00346,0.01450,0.13700,0.00633,0.00788,0.01267,0.01898,0.00659,26.83300,1,0.393563,0.700246,-6.600023,0.249172,2.499148,0.105306
phon_R01_S25_6,151.98900,157.33900,132.85700,0.00174,0.00001,0.00075,0.00096,0.00225,0.01024,0.09300,0.00455,0.00576,0.00993,0.01364,0.00238,29.92800,1,0.311369,0.676066,-6.739151,0.160686,2.296873,0.115130
phon_R01_S26_1,193.03000,208.90000,80.29700,0.00766,0.00004,0.00450,0.00389,0.01351,0.03044,0.27500,0.01771,0.01815,0.02084,0.05312,0.00947,21.93400,1,0.497554,0.740539,-5.845099,0.278679,2.608749,0.185668
phon_R01_S26_2,200.71400,223.98200,89.68600,0.00621,0.00003,0.00371,0.00337,0.01112,0.02286,0.20700,0.01192,0.01439,0.01852,0.03576,0.00704,23.23900,1,0.436084,0.727863,-5.258320,0.256454,2.550961,0.232520
phon_R01_S26_3,208.51900,220.31500,199.02000,0.00609,0.00003,0.00368,0.00339,0.01105,0.01761,0.15500,0.00952,0.01058,0.01307,0.02855,0.00830,22.40700,1,0.338097,0.712466,-6.471427,0.184378,2.502336,0.136390
phon_R01_S26_4,204.66400,221.30000,189.62100,0.00841,0.00004,0.00502,0.00485,0.01506,0.02378,0.21000,0.01277,0.01483,0.01767,0.03831,0.01316,21.30500,1,0.498877,0.722085,-4.876336,0.212054,2.376749,0.268144
phon_R01_S26_5,210.14100,232.70600,185.25800,0.00534,0.00003,0.00321,0.00280,0.00964,0.01680,0.14900,0.00861,0.01017,0.01301,0.02583,0.00620,23.67100,1,0.441097,0.722254,-5.963040,0.250283,2.489191,0.177807
phon_R01_S26_6,206.32700,226.35500,92.02000,0.00495,0.00002,0.00302,0.00246,0.00905,0.02105,0.20900,0.01107,0.01284,0.01604,0.03320,0.01048,21.86400,1,0.331508,0.715121,-6.729713,0.181701,2.938114,0.115515
phon_R01_S27_1,151.87200,492.89200,69.08500,0.00856,0.00006,0.00404,0.00385,0.01211,0.01843,0.23500,0.00796,0.00832,0.01271,0.02389,0.06051,23.69300,1,0.407701,0.662668,-4.673241,0.261549,2.702355,0.274407
phon_R01_S27_2,158.21900,442.55700,71.94800,0.00476,0.00003,0.00214,0.00207,0.00642,0.01458,0.14800,0.00606,0.00747,0.01312,0.01818,0.01554,26.35600,1,0.450798,0.653823,-6.051233,0.273280,2.640798,0.170106
phon_R01_S27_3,170.75600,450.24700,79.03200,0.00555,0.00003,0.00244,0.00261,0.00731,0.01725,0.17500,0.00757,0.00971,0.01652,0.02270,0.01802,25.69000,1,0.486738,0.676023,-4.597834,0.372114,2.975889,0.282780
phon_R01_S27_4,178.28500,442.82400,82.06300,0.00462,0.00003,0.00157,0.00194,0.00472,0.01279,0.12900,0.00617,0.00744,0.01151,0.01851,0.00856,25.02000,1,0.470422,0.655239,-4.913137,0.393056,2.816781,0.251972
phon_R01_S27_5,217.11600,233.48100,93.97800,0.00404,0.00002,0.00127,0.00128,0.00381,0.01299,0.12400,0.00679,0.00631,0.01075,0.02038,0.00681,24.58100,1,0.462516,0.582710,-5.517173,0.389295,2.925862,0.220657
phon_R01_S27_6,128.94000,479.69700,88.25100,0.00581,0.00005,0.00241,0.00314,0.00723,0.02008,0.22100,0.00849,0.01117,0.01734,0.02548,0.02350,24.74300,1,0.487756,0.684130,-6.186128,0.279933,2.686240,0.152428
phon_R01_S27_7,176.82400,215.29300,83.96100,0.00460,0.00003,0.00209,0.00221,0.00628,0.01169,0.11700,0.00534,0.00630,0.01104,0.01603,0.01161,27.16600,1,0.400088,0.656182,-4.711007,0.281618,2.655744,0.234809
phon_R01_S31_1,138.19000,203.52200,83.34000,0.00704,0.00005,0.00406,0.00398,0.01218,0.04479,0.44100,0.02587,0.02567,0.03220,0.07761,0.01968,18.30500,1,0.538016,0.741480,-5.418787,0.160267,2.090438,0.229892
phon_R01_S31_2,182.01800,197.17300,79.18700,0.00842,0.00005,0.00506,0.00449,0.01517,0.02503,0.23100,0.01372,0.01580,0.01931,0.04115,0.01813,18.78400,1,0.589956,0.732903,-5.445140,0.142466,2.174306,0.215558
phon_R01_S31_3,156.23900,195.10700,79.82000,0.00694,0.00004,0.00403,0.00395,0.01209,0.02343,0.22400,0.01289,0.01420,0.01720,0.03867,0.02020,19.19600,1,0.618663,0.728421,-5.944191,0.143359,1.929715,0.181988
phon_R01_S31_4,145.17400,198.10900,80.63700,0.00733,0.00005,0.00414,0.00422,0.01242,0.02362,0.23300,0.01235,0.01495,0.01944,0.03706,0.01874,18.85700,1,0.637518,0.735546,-5.594275,0.127950,1.765957,0.222716
phon_R01_S31_5,138.14500,197.23800,81.11400,0.00544,0.00004,0.00294,0.00327,0.00883,0.02791,0.24600,0.01484,0.01805,0.02259,0.04451,0.01794,18.17800,1,0.623209,0.738245,-5.540351,0.087165,1.821297,0.214075
phon_R01_S31_6,166.88800,198.96600,79.51200,0.00638,0.00004,0.00368,0.00351,0.01104,0.02857,0.25700,0.01547,0.01859,0.02301,0.04641,0.01796,18.33000,1,0.585169,0.736964,-5.825257,0.115697,1.996146,0.196535
phon_R01_S32_1,119.03100,127.53300,109.21600,0.00440,0.00004,0.00214,0.00192,0.00641,0.01033,0.09800,0.00538,0.00570,0.00811,0.01614,0.01724,26.84200,1,0.457541,0.699787,-6.890021,0.152941,2.328513,0.112856
phon_R01_S32_2,120.07800,126.63200,105.66700,0.00270,0.00002,0.00116,0.00135,0.00349,0.01022,0.09000,0.00476,0.00588,0.00903,0.01428,0.00487,26.36900,1,0.491345,0.718839,-5.892061,0.195976,2.108873,0.183572
phon_R01_S32_3,120.28900,128.14300,100.20900,0.00492,0.00004,0.00269,0.00238,0.00808,0.01412,0.12500,0.00703,0.00820,0.01194,0.02110,0.01610,23.94900,1,0.467160,0.724045,-6.135296,0.203630,2.539724,0.169923
phon_R01_S32_4,120.25600,125.30600,104.77300,0.00407,0.00003,0.00224,0.00205,0.00671,0.01516,0.13800,0.00721,0.00815,0.01310,0.02164,0.01015,26.01700,1,0.468621,0.735136,-6.112667,0.217013,2.527742,0.170633
phon_R01_S32_5,119.05600,125.21300,86.79500,0.00346,0.00003,0.00169,0.00170,0.00508,0.01201,0.10600,0.00633,0.00701,0.00915,0.01898,0.00903,23.38900,1,0.470972,0.721308,-5.436135,0.254909,2.516320,0.232209
phon_R01_S32_6,118.74700,123.72300,109.83600,0.00331,0.00003,0.00168,0.00171,0.00504,0.01043,0.09900,0.00490,0.00621,0.00903,0.01471,0.00504,25.61900,1,0.482296,0.723096,-6.448134,0.178713,2.034827,0.141422
phon_R01_S33_1,106.51600,112.77700,93.10500,0.00589,0.00006,0.00291,0.00319,0.00873,0.04932,0.44100,0.02683,0.03112,0.03651,0.08050,0.03031,17.06000,1,0.637814,0.744064,-5.301321,0.320385,2.375138,0.243080
phon_R01_S33_2,110.45300,127.61100,105.55400,0.00494,0.00004,0.00244,0.00315,0.00731,0.04128,0.37900,0.02229,0.02592,0.03316,0.06688,0.02529,17.70700,1,0.653427,0.706687,-5.333619,0.322044,2.631793,0.228319
phon_R01_S33_3,113.40000,133.34400,107.81600,0.00451,0.00004,0.00219,0.00283,0.00658,0.04879,0.43100,0.02385,0.02973,0.04370,0.07154,0.02278,19.01300,1,0.647900,0.708144,-4.378916,0.300067,2.445502,0.259451
phon_R01_S33_4,113.16600,130.27000,100.67300,0.00502,0.00004,0.00257,0.00312,0.00772,0.05279,0.47600,0.02896,0.03347,0.04134,0.08689,0.03690,16.74700,1,0.625362,0.708617,-4.654894,0.304107,2.672362,0.274387
phon_R01_S33_5,112.23900,126.60900,104.09500,0.00472,0.00004,0.00238,0.00290,0.00715,0.05643,0.51700,0.03070,0.03530,0.04451,0.09211,0.02629,17.36600,1,0.640945,0.701404,-5.634576,0.306014,2.419253,0.209191
phon_R01_S33_6,116.15000,131.73100,109.81500,0.00381,0.00003,0.00181,0.00232,0.00542,0.03026,0.26700,0.01514,0.01812,0.02770,0.04543,0.01827,18.80100,1,0.624811,0.696049,-5.866357,0.233070,2.445646,0.184985
phon_R01_S34_1,170.36800,268.79600,79.54300,0.00571,0.00003,0.00232,0.00269,0.00696,0.03273,0.28100,0.01713,0.01964,0.02824,0.05139,0.02485,18.54000,1,0.677131,0.685057,-4.796845,0.397749,2.963799,0.277227
phon_R01_S34_2,208.08300,253.79200,91.80200,0.00757,0.00004,0.00428,0.00428,0.01285,0.06725,0.57100,0.04016,0.04003,0.04464,0.12047,0.04238,15.64800,1,0.606344,0.665945,-5.410336,0.288917,2.665133,0.231723
phon_R01_S34_3,198.45800,219.29000,148.69100,0.00376,0.00002,0.00182,0.00215,0.00546,0.03527,0.29700,0.02055,0.02076,0.02530,0.06165,0.01728,18.70200,1,0.606273,0.661735,-5.585259,0.310746,2.465528,0.209863
phon_R01_S34_4,202.80500,231.50800,86.23200,0.00370,0.00002,0.00189,0.00211,0.00568,0.01997,0.18000,0.01117,0.01177,0.01506,0.03350,0.02010,18.68700,1,0.536102,0.632631,-5.898673,0.213353,2.470746,0.189032
phon_R01_S34_5,202.54400,241.35000,164.16800,0.00254,0.00001,0.00100,0.00133,0.00301,0.02662,0.22800,0.01475,0.01558,0.02006,0.04426,0.01049,20.68000,1,0.497480,0.630409,-6.132663,0.220617,2.576563,0.159777
phon_R01_S34_6,223.36100,263.87200,87.63800,0.00352,0.00002,0.00169,0.00188,0.00506,0.02536,0.22500,0.01379,0.01478,0.01909,0.04137,0.01493,20.36600,1,0.566849,0.574282,-5.456811,0.345238,2.840556,0.232861
phon_R01_S35_1,169.77400,191.75900,151.45100,0.01568,0.00009,0.00863,0.00946,0.02589,0.08143,0.82100,0.03804,0.05426,0.08808,0.11411,0.07530,12.35900,1,0.561610,0.793509,-3.297668,0.414758,3.413649,0.457533
phon_R01_S35_2,183.52000,216.81400,161.34000,0.01466,0.00008,0.00849,0.00819,0.02546,0.06050,0.61800,0.02865,0.04101,0.06359,0.08595,0.06057,14.36700,1,0.478024,0.768974,-4.276605,0.355736,3.142364,0.336085
phon_R01_S35_3,188.62000,216.30200,165.98200,0.01719,0.00009,0.00996,0.01027,0.02987,0.07118,0.72200,0.03474,0.04580,0.06824,0.10422,0.08069,12.29800,1,0.552870,0.764036,-3.377325,0.335357,3.274865,0.418646
phon_R01_S35_4,202.63200,565.74000,177.25800,0.01627,0.00008,0.00919,0.00963,0.02756,0.07170,0.83300,0.03515,0.04265,0.06460,0.10546,0.07889,14.98900,1,0.427627,0.775708,-4.892495,0.262281,2.910213,0.270173
phon_R01_S35_5,186.69500,211.96100,149.44200,0.01872,0.00010,0.01075,0.01154,0.03225,0.05830,0.78400,0.02699,0.03714,0.06259,0.08096,0.10952,12.52900,1,0.507826,0.762726,-4.484303,0.340256,2.958815,0.301487
phon_R01_S35_6,192.81800,224.42900,168.79300,0.03107,0.00016,0.01800,0.01958,0.05401,0.11908,1.30200,0.05647,0.07940,0.13778,0.16942,0.21713,8.44100,1,0.625866,0.768320,-2.434031,0.450493,3.079221,0.527367
phon_R01_S35_7,198.11600,233.09900,174.47800,0.02714,0.00014,0.01568,0.01699,0.04705,0.08684,1.01800,0.04284,0.05556,0.08318,0.12851,0.16265,9.44900,1,0.584164,0.754449,-2.839756,0.356224,3.184027,0.454721
phon_R01_S37_1,121.34500,139.64400,98.25000,0.00684,0.00006,0.00388,0.00332,0.01164,0.02534,0.24100,0.01340,0.01399,0.02056,0.04019,0.04179,21.52000,1,0.566867,0.670475,-4.865194,0.246404,2.013530,0.168581
phon_R01_S37_2,119.10000,128.44200,88.83300,0.00692,0.00006,0.00393,0.00300,0.01179,0.02682,0.23600,0.01484,0.01405,0.02018,0.04451,0.04611,21.82400,1,0.651680,0.659333,-4.239028,0.175691,2.451130,0.247455
phon_R01_S37_3,117.87000,127.34900,95.65400,0.00647,0.00005,0.00356,0.00300,0.01067,0.03087,0.27600,0.01659,0.01804,0.02402,0.04977,0.02631,22.43100,1,0.628300,0.652025,-3.583722,0.207914,2.439597,0.206256
phon_R01_S37_4,122.33600,142.36900,94.79400,0.00727,0.00006,0.00415,0.00339,0.01246,0.02293,0.22300,0.01205,0.01289,0.01771,0.03615,0.03191,22.95300,1,0.611679,0.623731,-5.435100,0.230532,2.699645,0.220546
phon_R01_S37_5,117.96300,134.20900,100.75700,0.01813,0.00015,0.01117,0.00718,0.03351,0.04912,0.43800,0.02610,0.02161,0.02916,0.07830,0.10748,19.07500,1,0.630547,0.646786,-3.444478,0.303214,2.964568,0.261305
phon_R01_S37_6,126.14400,154.28400,97.54300,0.00975,0.00008,0.00593,0.00454,0.01778,0.02852,0.26600,0.01500,0.01581,0.02157,0.04499,0.03828,21.53400,1,0.635015,0.627337,-5.070096,0.280091,2.892300,0.249703
phon_R01_S39_1,127.93000,138.75200,112.17300,0.00605,0.00005,0.00321,0.00318,0.00962,0.03235,0.33900,0.01360,0.01650,0.03105,0.04079,0.02663,19.65100,1,0.654945,0.675865,-5.498456,0.234196,2.103014,0.216638
phon_R01_S39_2,114.23800,124.39300,77.02200,0.00581,0.00005,0.00299,0.00316,0.00896,0.04009,0.40600,0.01579,0.01994,0.04114,0.04736,0.02073,20.43700,1,0.653139,0.694571,-5.185987,0.259229,2.151121,0.244948
phon_R01_S39_3,115.32200,135.73800,107.80200,0.00619,0.00005,0.00352,0.00329,0.01057,0.03273,0.32500,0.01644,0.01722,0.02931,0.04933,0.02810,19.38800,1,0.577802,0.684373,-5.283009,0.226528,2.442906,0.238281
phon_R01_S39_4,114.55400,126.77800,91.12100,0.00651,0.00006,0.00366,0.00340,0.01097,0.03658,0.36900,0.01864,0.01940,0.03091,0.05592,0.02707,18.95400,1,0.685151,0.719576,-5.529833,0.242750,2.408689,0.220520
phon_R01_S39_5,112.15000,131.66900,97.52700,0.00519,0.00005,0.00291,0.00284,0.00873,0.01756,0.15500,0.00967,0.01033,0.01363,0.02902,0.01435,21.21900,1,0.557045,0.673086,-5.617124,0.184896,1.871871,0.212386
phon_R01_S39_6,102.27300,142.83000,85.90200,0.00907,0.00009,0.00493,0.00461,0.01480,0.02814,0.27200,0.01579,0.01553,0.02073,0.04736,0.03882,18.44700,1,0.671378,0.674562,-2.929379,0.396746,2.560422,0.367233
phon_R01_S42_1,236.20000,244.66300,102.13700,0.00277,0.00001,0.00154,0.00153,0.00462,0.02448,0.21700,0.01410,0.01426,0.01621,0.04231,0.00620,24.07800,0,0.469928,0.628232,-6.816086,0.172270,2.235197,0.119652
phon_R01_S42_2,237.32300,243.70900,229.25600,0.00303,0.00001,0.00173,0.00159,0.00519,0.01242,0.11600,0.00696,0.00747,0.00882,0.02089,0.00533,24.67900,0,0.384868,0.626710,-7.018057,0.176316,1.852402,0.091604
phon_R01_S42_3,260.10500,264.91900,237.30300,0.00339,0.00001,0.00205,0.00186,0.00616,0.02030,0.19700,0.01186,0.01230,0.01367,0.03557,0.00910,21.08300,0,0.440988,0.628058,-7.517934,0.160414,1.881767,0.075587
phon_R01_S42_4,197.56900,217.62700,90.79400,0.00803,0.00004,0.00490,0.00448,0.01470,0.02177,0.18900,0.01279,0.01272,0.01439,0.03836,0.01337,19.26900,0,0.372222,0.725216,-5.736781,0.164529,2.882450,0.202879
phon_R01_S42_5,240.30100,245.13500,219.78300,0.00517,0.00002,0.00316,0.00283,0.00949,0.02018,0.21200,0.01176,0.01191,0.01344,0.03529,0.00965,21.02000,0,0.371837,0.646167,-7.169701,0.073298,2.266432,0.100881
phon_R01_S42_6,244.99000,272.21000,239.17000,0.00451,0.00002,0.00279,0.00237,0.00837,0.01897,0.18100,0.01084,0.01121,0.01255,0.03253,0.01049,21.52800,0,0.522812,0.646818,-7.304500,0.171088,2.095237,0.096220
phon_R01_S43_1,112.54700,133.37400,105.71500,0.00355,0.00003,0.00166,0.00190,0.00499,0.01358,0.12900,0.00664,0.00786,0.01140,0.01992,0.00435,26.43600,0,0.413295,0.756700,-6.323531,0.218885,2.193412,0.160376
phon_R01_S43_2,110.73900,113.59700,100.13900,0.00356,0.00003,0.00170,0.00200,0.00510,0.01484,0.13300,0.00754,0.00950,0.01285,0.02261,0.00430,26.55000,0,0.369090,0.776158,-6.085567,0.192375,1.889002,0.174152
phon_R01_S43_3,113.71500,116.44300,96.91300,0.00349,0.00003,0.00171,0.00203,0.00514,0.01472,0.13300,0.00748,0.00905,0.01148,0.02245,0.00478,26.54700,0,0.380253,0.766700,-5.943501,0.192150,1.852542,0.179677
phon_R01_S43_4,117.00400,144.46600,99.92300,0.00353,0.00003,0.00176,0.00218,0.00528,0.01657,0.14500,0.00881,0.01062,0.01318,0.02643,0.00590,25.44500,0,0.387482,0.756482,-6.012559,0.229298,1.872946,0.163118
phon_R01_S43_5,115.38000,123.10900,108.63400,0.00332,0.00003,0.00160,0.00199,0.00480,0.01503,0.13700,0.00812,0.00933,0.01133,0.02436,0.00401,26.00500,0,0.405991,0.761255,-5.966779,0.197938,1.974857,0.184067
phon_R01_S43_6,116.38800,129.03800,108.97000,0.00346,0.00003,0.00169,0.00213,0.00507,0.01725,0.15500,0.00874,0.01021,0.01331,0.02623,0.00415,26.14300,0,0.361232,0.763242,-6.016891,0.109256,2.004719,0.174429
phon_R01_S44_1,151.73700,190.20400,129.85900,0.00314,0.00002,0.00135,0.00162,0.00406,0.01469,0.13200,0.00728,0.00886,0.01230,0.02184,0.00570,24.15100,1,0.396610,0.745957,-6.486822,0.197919,2.449763,0.132703
phon_R01_S44_2,148.79000,158.35900,138.99000,0.00309,0.00002,0.00152,0.00186,0.00456,0.01574,0.14200,0.00839,0.00956,0.01309,0.02518,0.00488,24.41200,1,0.402591,0.762508,-6.311987,0.182459,2.251553,0.160306
phon_R01_S44_3,148.14300,155.98200,135.04100,0.00392,0.00003,0.00204,0.00231,0.00612,0.01450,0.13100,0.00725,0.00876,0.01263,0.02175,0.00540,23.68300,1,0.398499,0.778349,-5.711205,0.240875,2.845109,0.192730
phon_R01_S44_4,150.44000,163.44100,144.73600,0.00396,0.00003,0.00206,0.00233,0.00619,0.02551,0.23700,0.01321,0.01574,0.02148,0.03964,0.00611,23.13300,1,0.352396,0.759320,-6.261446,0.183218,2.264226,0.144105
phon_R01_S44_5,148.46200,161.07800,141.99800,0.00397,0.00003,0.00202,0.00235,0.00605,0.01831,0.16300,0.00950,0.01103,0.01559,0.02849,0.00639,22.86600,1,0.408598,0.768845,-5.704053,0.216204,2.679185,0.197710
phon_R01_S44_6,149.81800,163.41700,144.78600,0.00336,0.00002,0.00174,0.00198,0.00521,0.02145,0.19800,0.01155,0.01341,0.01666,0.03464,0.00595,23.00800,1,0.329577,0.757180,-6.277170,0.109397,2.209021,0.156368
phon_R01_S49_1,117.22600,123.92500,106.65600,0.00417,0.00004,0.00186,0.00270,0.00558,0.01909,0.17100,0.00864,0.01223,0.01949,0.02592,0.00955,23.07900,0,0.603515,0.669565,-5.619070,0.191576,2.027228,0.215724
phon_R01_S49_2,116.84800,217.55200,99.50300,0.00531,0.00005,0.00260,0.00346,0.00780,0.01795,0.16300,0.00810,0.01144,0.01756,0.02429,0.01179,22.08500,0,0.663842,0.656516,-5.198864,0.206768,2.120412,0.252404
phon_R01_S49_3,116.28600,177.29100,96.98300,0.00314,0.00003,0.00134,0.00192,0.00403,0.01564,0.13600,0.00667,0.00990,0.01691,0.02001,0.00737,24.19900,0,0.598515,0.654331,-5.592584,0.133917,2.058658,0.214346
phon_R01_S49_4,116.55600,592.03000,86.22800,0.00496,0.00004,0.00254,0.00263,0.00762,0.01660,0.15400,0.00820,0.00972,0.01491,0.02460,0.01397,23.95800,0,0.566424,0.667654,-6.431119,0.153310,2.161936,0.120605
phon_R01_S49_5,116.34200,581.28900,94.24600,0.00267,0.00002,0.00115,0.00148,0.00345,0.01300,0.11700,0.00631,0.00789,0.01144,0.01892,0.00680,25.02300,0,0.528485,0.663884,-6.359018,0.116636,2.152083,0.138868
phon_R01_S49_6,114.56300,119.16700,86.64700,0.00327,0.00003,0.00146,0.00184,0.00439,0.01185,0.10600,0.00557,0.00721,0.01095,0.01672,0.00703,24.77500,0,0.555303,0.659132,-6.710219,0.149694,1.913990,0.121777
phon_R01_S50_1,201.77400,262.70700,78.22800,0.00694,0.00003,0.00412,0.00396,0.01235,0.02574,0.25500,0.01454,0.01582,0.01758,0.04363,0.04441,19.36800,0,0.508479,0.683761,-6.934474,0.159890,2.316346,0.112838
phon_R01_S50_2,174.18800,230.97800,94.26100,0.00459,0.00003,0.00263,0.00259,0.00790,0.04087,0.40500,0.02336,0.02498,0.02745,0.07008,0.02764,19.51700,0,0.448439,0.657899,-6.538586,0.121952,2.657476,0.133050
phon_R01_S50_3,209.51600,253.01700,89.48800,0.00564,0.00003,0.00331,0.00292,0.00994,0.02751,0.26300,0.01604,0.01657,0.01879,0.04812,0.01810,19.14700,0,0.431674,0.683244,-6.195325,0.129303,2.784312,0.168895
phon_R01_S50_4,174.68800,240.00500,74.28700,0.01360,0.00008,0.00624,0.00564,0.01873,0.02308,0.25600,0.01268,0.01365,0.01667,0.03804,0.10715,17.88300,0,0.407567,0.655683,-6.787197,0.158453,2.679772,0.131728
phon_R01_S50_5,198.76400,396.96100,74.90400,0.00740,0.00004,0.00370,0.00390,0.01109,0.02296,0.24100,0.01265,0.01321,0.01588,0.03794,0.07223,19.02000,0,0.451221,0.643956,-6.744577,0.207454,2.138608,0.123306
phon_R01_S50_6,214.28900,260.27700,77.97300,0.00567,0.00003,0.00295,0.00317,0.00885,0.01884,0.19000,0.01026,0.01161,0.01373,0.03078,0.04398,21.20900,0,0.462803,0.664357,-5.724056,0.190667,2.555477,0.148569

In [11]:
#To check the dimension or shape of the dataset
pdata.shape
Out[11]:
(195, 24)

This Voice recording dataset contains 195 obervations and 24 attributes

In [12]:
#status - Health status of the subject (one) - Parkinson's, (zero) - healthy
pdata.groupby('status').count()
Out[12]:
name MDVP:Fo(Hz) MDVP:Fhi(Hz) MDVP:Flo(Hz) MDVP:Jitter(%) MDVP:Jitter(Abs) MDVP:RAP MDVP:PPQ Jitter:DDP MDVP:Shimmer MDVP:Shimmer(dB) Shimmer:APQ3 Shimmer:APQ5 MDVP:APQ Shimmer:DDA NHR HNR RPDE DFA spread1 spread2 D2 PPE
status
0 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48
1 147 147 147 147 147 147 147 147 147 147 147 147 147 147 147 147 147 147 147 147 147 147 147

Total count of Health status of the person (one) - Parkinson's : 147

Total count of Health status of the person (Zero) - Healthy : 48

In [13]:
# To view the data type and number of values entered in each of the Independent attributes and Dependent attribute
loan_bank.info()
pdata.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 24 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   name              195 non-null    object 
 1   MDVP:Fo(Hz)       195 non-null    float64
 2   MDVP:Fhi(Hz)      195 non-null    float64
 3   MDVP:Flo(Hz)      195 non-null    float64
 4   MDVP:Jitter(%)    195 non-null    float64
 5   MDVP:Jitter(Abs)  195 non-null    float64
 6   MDVP:RAP          195 non-null    float64
 7   MDVP:PPQ          195 non-null    float64
 8   Jitter:DDP        195 non-null    float64
 9   MDVP:Shimmer      195 non-null    float64
 10  MDVP:Shimmer(dB)  195 non-null    float64
 11  Shimmer:APQ3      195 non-null    float64
 12  Shimmer:APQ5      195 non-null    float64
 13  MDVP:APQ          195 non-null    float64
 14  Shimmer:DDA       195 non-null    float64
 15  NHR               195 non-null    float64
 16  HNR               195 non-null    float64
 17  RPDE              195 non-null    float64
 18  DFA               195 non-null    float64
 19  spread1           195 non-null    float64
 20  spread2           195 non-null    float64
 21  D2                195 non-null    float64
 22  PPE               195 non-null    float64
 23  status            195 non-null    int64  
dtypes: float64(22), int64(1), object(1)
memory usage: 36.7+ KB

All the columns/attributes have 195 non-null values.

In [14]:
# display Number of Null values in each of the attribute
pdata.isnull().sum()
Out[14]:
name                0
MDVP:Fo(Hz)         0
MDVP:Fhi(Hz)        0
MDVP:Flo(Hz)        0
MDVP:Jitter(%)      0
MDVP:Jitter(Abs)    0
MDVP:RAP            0
MDVP:PPQ            0
Jitter:DDP          0
MDVP:Shimmer        0
MDVP:Shimmer(dB)    0
Shimmer:APQ3        0
Shimmer:APQ5        0
MDVP:APQ            0
Shimmer:DDA         0
NHR                 0
HNR                 0
RPDE                0
DFA                 0
spread1             0
spread2             0
D2                  0
PPE                 0
status              0
dtype: int64

No Null values present in each of the Attribute

In [15]:
# check whether the column has any value other than numeric values
pdata.iloc[:,1:][~pdata.iloc[:,1:].applymap(np.isreal).all(1)]
Out[15]:
MDVP:Fo(Hz) MDVP:Fhi(Hz) MDVP:Flo(Hz) MDVP:Jitter(%) MDVP:Jitter(Abs) MDVP:RAP MDVP:PPQ Jitter:DDP MDVP:Shimmer MDVP:Shimmer(dB) Shimmer:APQ3 Shimmer:APQ5 MDVP:APQ Shimmer:DDA NHR HNR RPDE DFA spread1 spread2 D2 PPE status

All columns are Numeric attributes except Name column

In [16]:
#describe() show the summary of statistics about all numeric attributes.
pdata.describe().transpose()
Out[16]:
count mean std min 25% 50% 75% max
MDVP:Fo(Hz) 195.0 154.228641 41.390065 88.333000 117.572000 148.790000 182.769000 260.105000
MDVP:Fhi(Hz) 195.0 197.104918 91.491548 102.145000 134.862500 175.829000 224.205500 592.030000
MDVP:Flo(Hz) 195.0 116.324631 43.521413 65.476000 84.291000 104.315000 140.018500 239.170000
MDVP:Jitter(%) 195.0 0.006220 0.004848 0.001680 0.003460 0.004940 0.007365 0.033160
MDVP:Jitter(Abs) 195.0 0.000044 0.000035 0.000007 0.000020 0.000030 0.000060 0.000260
MDVP:RAP 195.0 0.003306 0.002968 0.000680 0.001660 0.002500 0.003835 0.021440
MDVP:PPQ 195.0 0.003446 0.002759 0.000920 0.001860 0.002690 0.003955 0.019580
Jitter:DDP 195.0 0.009920 0.008903 0.002040 0.004985 0.007490 0.011505 0.064330
MDVP:Shimmer 195.0 0.029709 0.018857 0.009540 0.016505 0.022970 0.037885 0.119080
MDVP:Shimmer(dB) 195.0 0.282251 0.194877 0.085000 0.148500 0.221000 0.350000 1.302000
Shimmer:APQ3 195.0 0.015664 0.010153 0.004550 0.008245 0.012790 0.020265 0.056470
Shimmer:APQ5 195.0 0.017878 0.012024 0.005700 0.009580 0.013470 0.022380 0.079400
MDVP:APQ 195.0 0.024081 0.016947 0.007190 0.013080 0.018260 0.029400 0.137780
Shimmer:DDA 195.0 0.046993 0.030459 0.013640 0.024735 0.038360 0.060795 0.169420
NHR 195.0 0.024847 0.040418 0.000650 0.005925 0.011660 0.025640 0.314820
HNR 195.0 21.885974 4.425764 8.441000 19.198000 22.085000 25.075500 33.047000
RPDE 195.0 0.498536 0.103942 0.256570 0.421306 0.495954 0.587562 0.685151
DFA 195.0 0.718099 0.055336 0.574282 0.674758 0.722254 0.761881 0.825288
spread1 195.0 -5.684397 1.090208 -7.964984 -6.450096 -5.720868 -5.046192 -2.434031
spread2 195.0 0.226510 0.083406 0.006274 0.174351 0.218885 0.279234 0.450493
D2 195.0 2.381826 0.382799 1.423287 2.099125 2.361532 2.636456 3.671155
PPE 195.0 0.206552 0.090119 0.044539 0.137451 0.194052 0.252980 0.527367
status 195.0 0.753846 0.431878 0.000000 1.000000 1.000000 1.000000 1.000000

MDVP:Fo(Hz) - Average vocal fundamental frequency's mean value is 154.228641

MDVP:Fhi(Hz) - Maximum vocal fundamental frequency's maximum value is 592.030000

MDVP:Flo(Hz) - Minimum vocal fundamental frequency's minimum value is 0.001680

PPE attribute(nonlinear measures of fundamental frequency) spreads 75% of the data point in the range of around 0.252980 Status - Maximum rate of the healthy status of the subject which indicates more persons affected by Parkinsons Disease

In [17]:
#Check Correlation of all Attributes
pdata.corr()
Out[17]:
MDVP:Fo(Hz) MDVP:Fhi(Hz) MDVP:Flo(Hz) MDVP:Jitter(%) MDVP:Jitter(Abs) MDVP:RAP MDVP:PPQ Jitter:DDP MDVP:Shimmer MDVP:Shimmer(dB) Shimmer:APQ3 Shimmer:APQ5 MDVP:APQ Shimmer:DDA NHR HNR RPDE DFA spread1 spread2 D2 PPE status
MDVP:Fo(Hz) 1.000000 0.400985 0.596546 -0.118003 -0.382027 -0.076194 -0.112165 -0.076213 -0.098374 -0.073742 -0.094717 -0.070682 -0.077774 -0.094732 -0.021981 0.059144 -0.383894 -0.446013 -0.413738 -0.249450 0.177980 -0.372356 -0.383535
MDVP:Fhi(Hz) 0.400985 1.000000 0.084951 0.102086 -0.029198 0.097177 0.091126 0.097150 0.002281 0.043465 -0.003743 -0.009997 0.004937 -0.003733 0.163766 -0.024893 -0.112404 -0.343097 -0.076658 -0.002954 0.176323 -0.069543 -0.166136
MDVP:Flo(Hz) 0.596546 0.084951 1.000000 -0.139919 -0.277815 -0.100519 -0.095828 -0.100488 -0.144543 -0.119089 -0.150747 -0.101095 -0.107293 -0.150737 -0.108670 0.210851 -0.400143 -0.050406 -0.394857 -0.243829 -0.100629 -0.340071 -0.380200
MDVP:Jitter(%) -0.118003 0.102086 -0.139919 1.000000 0.935714 0.990276 0.974256 0.990276 0.769063 0.804289 0.746625 0.725561 0.758255 0.746635 0.906959 -0.728165 0.360673 0.098572 0.693577 0.385123 0.433434 0.721543 0.278220
MDVP:Jitter(Abs) -0.382027 -0.029198 -0.277815 0.935714 1.000000 0.922911 0.897778 0.922913 0.703322 0.716601 0.697153 0.648961 0.648793 0.697170 0.834972 -0.656810 0.441839 0.175036 0.735779 0.388543 0.310694 0.748162 0.338653
MDVP:RAP -0.076194 0.097177 -0.100519 0.990276 0.922911 1.000000 0.957317 1.000000 0.759581 0.790652 0.744912 0.709927 0.737455 0.744919 0.919521 -0.721543 0.342140 0.064083 0.648328 0.324407 0.426605 0.670999 0.266668
MDVP:PPQ -0.112165 0.091126 -0.095828 0.974256 0.897778 0.957317 1.000000 0.957319 0.797826 0.839239 0.763580 0.786780 0.804139 0.763592 0.844604 -0.731510 0.333274 0.196301 0.716489 0.407605 0.412524 0.769647 0.288698
Jitter:DDP -0.076213 0.097150 -0.100488 0.990276 0.922913 1.000000 0.957319 1.000000 0.759555 0.790621 0.744894 0.709907 0.737439 0.744901 0.919548 -0.721494 0.342079 0.064026 0.648328 0.324377 0.426556 0.671005 0.266646
MDVP:Shimmer -0.098374 0.002281 -0.144543 0.769063 0.703322 0.759581 0.797826 0.759555 1.000000 0.987258 0.987625 0.982835 0.950083 0.987626 0.722194 -0.835271 0.447424 0.159954 0.654734 0.452025 0.507088 0.693771 0.367430
MDVP:Shimmer(dB) -0.073742 0.043465 -0.119089 0.804289 0.716601 0.790652 0.839239 0.790621 0.987258 1.000000 0.963198 0.973751 0.960977 0.963202 0.744477 -0.827805 0.410684 0.165157 0.652547 0.454314 0.512233 0.695058 0.350697
Shimmer:APQ3 -0.094717 -0.003743 -0.150747 0.746625 0.697153 0.744912 0.763580 0.744894 0.987625 0.963198 1.000000 0.960070 0.896645 1.000000 0.716207 -0.827123 0.435242 0.151124 0.610967 0.402243 0.467265 0.645377 0.347617
Shimmer:APQ5 -0.070682 -0.009997 -0.101095 0.725561 0.648961 0.709927 0.786780 0.709907 0.982835 0.973751 0.960070 1.000000 0.949146 0.960072 0.658080 -0.813753 0.399903 0.213873 0.646809 0.457195 0.502174 0.702456 0.351148
MDVP:APQ -0.077774 0.004937 -0.107293 0.758255 0.648793 0.737455 0.804139 0.737439 0.950083 0.960977 0.896645 0.949146 1.000000 0.896647 0.694019 -0.800407 0.451379 0.157276 0.673158 0.502188 0.536869 0.721694 0.364316
Shimmer:DDA -0.094732 -0.003733 -0.150737 0.746635 0.697170 0.744919 0.763592 0.744901 0.987626 0.963202 1.000000 0.960072 0.896647 1.000000 0.716215 -0.827130 0.435237 0.151132 0.610971 0.402223 0.467261 0.645389 0.347608
NHR -0.021981 0.163766 -0.108670 0.906959 0.834972 0.919521 0.844604 0.919548 0.722194 0.744477 0.716207 0.658080 0.694019 0.716215 1.000000 -0.714072 0.370890 -0.131882 0.540865 0.318099 0.470949 0.552591 0.189429
HNR 0.059144 -0.024893 0.210851 -0.728165 -0.656810 -0.721543 -0.731510 -0.721494 -0.835271 -0.827805 -0.827123 -0.813753 -0.800407 -0.827130 -0.714072 1.000000 -0.598736 -0.008665 -0.673210 -0.431564 -0.601401 -0.692876 -0.361515
RPDE -0.383894 -0.112404 -0.400143 0.360673 0.441839 0.342140 0.333274 0.342079 0.447424 0.410684 0.435242 0.399903 0.451379 0.435237 0.370890 -0.598736 1.000000 -0.110950 0.591117 0.479905 0.236931 0.545886 0.308567
DFA -0.446013 -0.343097 -0.050406 0.098572 0.175036 0.064083 0.196301 0.064026 0.159954 0.165157 0.151124 0.213873 0.157276 0.151132 -0.131882 -0.008665 -0.110950 1.000000 0.195668 0.166548 -0.165381 0.270445 0.231739
spread1 -0.413738 -0.076658 -0.394857 0.693577 0.735779 0.648328 0.716489 0.648328 0.654734 0.652547 0.610967 0.646809 0.673158 0.610971 0.540865 -0.673210 0.591117 0.195668 1.000000 0.652358 0.495123 0.962435 0.564838
spread2 -0.249450 -0.002954 -0.243829 0.385123 0.388543 0.324407 0.407605 0.324377 0.452025 0.454314 0.402243 0.457195 0.502188 0.402223 0.318099 -0.431564 0.479905 0.166548 0.652358 1.000000 0.523532 0.644711 0.454842
D2 0.177980 0.176323 -0.100629 0.433434 0.310694 0.426605 0.412524 0.426556 0.507088 0.512233 0.467265 0.502174 0.536869 0.467261 0.470949 -0.601401 0.236931 -0.165381 0.495123 0.523532 1.000000 0.480585 0.340232
PPE -0.372356 -0.069543 -0.340071 0.721543 0.748162 0.670999 0.769647 0.671005 0.693771 0.695058 0.645377 0.702456 0.721694 0.645389 0.552591 -0.692876 0.545886 0.270445 0.962435 0.644711 0.480585 1.000000 0.531039
status -0.383535 -0.166136 -0.380200 0.278220 0.338653 0.266668 0.288698 0.266646 0.367430 0.350697 0.347617 0.351148 0.364316 0.347608 0.189429 -0.361515 0.308567 0.231739 0.564838 0.454842 0.340232 0.531039 1.000000

3. Using univariate & bivariate analysis to check the individual attributes for their basic statistics such as central values, spread, tails, relationships between variables etc. mention your observations

Univariate Analysis

In [18]:
pdata.kurtosis(numeric_only  = True)
Out[18]:
MDVP:Fo(Hz)         -0.627898
MDVP:Fhi(Hz)         7.627241
MDVP:Flo(Hz)         0.654615
MDVP:Jitter(%)      12.030939
MDVP:Jitter(Abs)    10.869043
MDVP:RAP            14.213798
MDVP:PPQ            11.963922
Jitter:DDP          14.224762
MDVP:Shimmer         3.238308
MDVP:Shimmer(dB)     5.128193
Shimmer:APQ3         2.720152
Shimmer:APQ5         3.874210
MDVP:APQ            11.163288
Shimmer:DDA          2.720661
NHR                 21.994974
HNR                  0.616036
RPDE                -0.921781
DFA                 -0.686152
spread1             -0.050199
spread2             -0.083023
D2                   0.220334
PPE                  0.528335
status              -0.595518
dtype: float64

Kurtosis with positive values indicates that those attributes have more data points around the tail

In [19]:
pdata.skew(numeric_only  = True)
Out[19]:
MDVP:Fo(Hz)         0.591737
MDVP:Fhi(Hz)        2.542146
MDVP:Flo(Hz)        1.217350
MDVP:Jitter(%)      3.084946
MDVP:Jitter(Abs)    2.649071
MDVP:RAP            3.360708
MDVP:PPQ            3.073892
Jitter:DDP          3.362058
MDVP:Shimmer        1.666480
MDVP:Shimmer(dB)    1.999389
Shimmer:APQ3        1.580576
Shimmer:APQ5        1.798697
MDVP:APQ            2.618047
Shimmer:DDA         1.580618
NHR                 4.220709
HNR                -0.514317
RPDE               -0.143402
DFA                -0.033214
spread1             0.432139
spread2             0.144430
D2                  0.430384
PPE                 0.797491
status             -1.187727
dtype: float64

Skewness with positive values indicates data is skewed towards right. Skewness with negative values indicates data is skewed towards left

Average vocal fundamental frequency MDVP:Fo(Hz)

In [20]:
print("The average vocal fundamental frequency of person is {:.2f} and \n 90% of the people have a Fo of around {:.2f}".format(pdata['MDVP:Fo(Hz)'].mean(),pdata['MDVP:Fo(Hz)'].quantile(0.90)))
The average vocal fundamental frequency of person is 154.23 and 
 90% of the people have a Fo of around 209.89
In [21]:
pdata['MDVP:Fo(Hz)'].plot(kind='box');

No outliers present for MDVP:Fo(Hz)

In [22]:
print('Skewness :',pdata['MDVP:Fo(Hz)'].skew())
print('Kurtosis :',pdata['MDVP:Fo(Hz)'].kurtosis())
sns.distplot(pdata['MDVP:Fo(Hz)'],kde = True,rug = True);
Skewness : 0.5917374636540784
Kurtosis : -0.6278981066788805

The skweness value is positive hence the data is skewed towards right side

The kurtosis value is negative hence less data points are around the tail

Maximum vocal fundamental frequency MDVP:Fhi (Hz)

In [23]:
print("The maximum vocal fundamental frequency of a person is {:.2f} and \n 90% of the people have a Fhi of {:.2f}".format(pdata['MDVP:Fhi(Hz)'].mean(),pdata['MDVP:Fhi(Hz)'].quantile(0.90)))
The maximum vocal fundamental frequency of a person is 197.10 and 
 90% of the people have a Fhi of 261.00
In [24]:
print(pdata['MDVP:Fhi(Hz)'].head(10))
pdata['MDVP:Fhi(Hz)'].plot(kind='box');
0    157.302
1    148.650
2    131.111
3    137.871
4    141.781
5    131.162
6    137.244
7    113.840
8    132.068
9    120.103
Name: MDVP:Fhi(Hz), dtype: float64

More number of outliers present for MDVP:Fhi(Hz)

In [25]:
print('Skewness :',pdata['MDVP:Fhi(Hz)'].skew())
print('Kurtosis :',pdata['MDVP:Fhi(Hz)'].kurtosis())
sns.distplot(pdata['MDVP:Fhi(Hz)'],kde = True,rug = True);
Skewness : 2.542145997588398
Kurtosis : 7.627241211631889

The Skewness value is positive hence the data is skewed towards right side

The Kurtosis value is postive hence more data points are around the tail

In [26]:
#Outlier Treatment
q3 = pdata['MDVP:Fhi(Hz)'].quantile(0.75)
q1 = pdata['MDVP:Fhi(Hz)'].quantile(0.25)
iqr = q3-q1
out_above = q3+iqr
out_below = q1-iqr
print("outliers_above : {}".format(out_above))
print("outliers_below : {}".format(out_below))
outliers_above : 313.5485
outliers_below : 45.51950000000002
In [27]:
print("Total observations above outlier :",pdata['MDVP:Fhi(Hz)'].loc[pdata['MDVP:Fhi(Hz)']>out_above].count())
print("Total observations below outlier : ",pdata['MDVP:Fhi(Hz)'].loc[pdata['MDVP:Fhi(Hz)']<out_below].count())
print("Data points above outlier :\n",pdata['MDVP:Fhi(Hz)'].loc[pdata['MDVP:Fhi(Hz)']>out_above])
Total observations above outlier : 12
Total observations below outlier :  0
Data points above outlier :
 16     349.259
73     588.518
102    586.567
115    492.892
116    442.557
117    450.247
118    442.824
120    479.697
149    565.740
186    592.030
187    581.289
193    396.961
Name: MDVP:Fhi(Hz), dtype: float64
In [28]:
mean_val = pdata['MDVP:Fhi(Hz)'].loc[pdata['MDVP:Fhi(Hz)']<=out_above].mean()
pdata['MDVP:Fhi(Hz)'] = pdata['MDVP:Fhi(Hz)'].mask(pdata['MDVP:Fhi(Hz)']>out_above,mean_val)
print("After Outlier Treatment")
print(pdata['MDVP:Fhi(Hz)'].head(10))
After Outlier Treatment
0    157.302
1    148.650
2    131.111
3    137.871
4    141.781
5    131.162
6    137.244
7    113.840
8    132.068
9    120.103
Name: MDVP:Fhi(Hz), dtype: float64
In [29]:
pdata['MDVP:Fhi(Hz)'].plot(kind='box');
In [30]:
print('Skewness :',pdata['MDVP:Fhi(Hz)'].skew())
print('Kurtosis :',pdata['MDVP:Fhi(Hz)'].kurtosis())
sns.distplot(pdata['MDVP:Fhi(Hz)'],kde = True,rug = True);
Skewness : 0.2984561761401523
Kurtosis : -1.0462818430124097

After outlier treatment kurtosis indicates that less number of data points are around the tail

Minimum vocal fundamental frequency MDVP:Flo (Hz)

In [31]:
print("The minimum vocal fundamental frequency of a person is {:.2f} and \n 90% of the people have a Flo of {:.2f}".format(pdata['MDVP:Flo(Hz)'].mean(),pdata['MDVP:Flo(Hz)'].quantile(0.90)))
The minimum vocal fundamental frequency of a person is 116.32 and 
 90% of the people have a Flo of 187.88
In [32]:
print(pdata['MDVP:Flo(Hz)'].head(10))
pdata['MDVP:Flo(Hz)'].plot(kind='box');
0     74.997
1    113.819
2    111.555
3    111.366
4    110.655
5    113.787
6    114.820
7    104.315
8     91.754
9     91.226
Name: MDVP:Flo(Hz), dtype: float64

More number of Ouliers present for MDVP:Flo(Hz)

In [33]:
print('Skewness : ',pdata['MDVP:Flo(Hz)'].skew())
print('Kurtosis : ',pdata['MDVP:Flo(Hz)'].kurtosis())
sns.distplot(pdata['MDVP:Flo(Hz)'],kde = True,rug = True);
Skewness :  1.217350448627808
Kurtosis :  0.6546145211395396

The skewnessvalue is positive hence the data is skewed towards right side

The kurtosis value is postive hence the more data points are around the tail

In [34]:
#Outlier Treatment
q3 = pdata['MDVP:Flo(Hz)'].quantile(0.75)
q1 = pdata['MDVP:Flo(Hz)'].quantile(0.25)
iqr = q3-q1
out_above = q3+iqr
out_below = q1-iqr
print("outliers_above : {}".format(out_above))
print("outliers_below : {}".format(out_below))
outliers_above : 195.74600000000004
outliers_below : 28.563499999999976
In [35]:
print("Total observations above outlier :",pdata['MDVP:Flo(Hz)'].loc[pdata['MDVP:Flo(Hz)']>out_above].count())
print("Total observations below outlier : ",pdata['MDVP:Flo(Hz)'].loc[pdata['MDVP:Flo(Hz)']<out_below].count())
print("Data points above outlier :\n",pdata['MDVP:Flo(Hz)'].loc[pdata['MDVP:Flo(Hz)']>out_above])
Total observations above outlier : 15
Total observations below outlier :  0
Data points above outlier :
 33     197.079
34     196.160
42     225.227
43     232.483
44     232.435
45     227.911
46     231.848
62     205.495
63     223.634
64     221.156
111    199.020
166    229.256
167    237.303
169    219.783
170    239.170
Name: MDVP:Flo(Hz), dtype: float64
In [36]:
max_val = pdata['MDVP:Flo(Hz)'].loc[pdata['MDVP:Flo(Hz)']<=out_above].max()
pdata['MDVP:Flo(Hz)'] = pdata['MDVP:Flo(Hz)'].mask(pdata['MDVP:Flo(Hz)']>out_above,max_val)
print("After Outlier treatment")
print(pdata['MDVP:Flo(Hz)'].head(10))
print(pdata['MDVP:Flo(Hz)'].plot(kind='box'));
After Outlier treatment
0     74.997
1    113.819
2    111.555
3    111.366
4    110.655
5    113.787
6    114.820
7    104.315
8     91.754
9     91.226
Name: MDVP:Flo(Hz), dtype: float64
AxesSubplot(0.08,0.125;0.87x0.755)
In [37]:
print('Skewness : ',pdata['MDVP:Flo(Hz)'].skew())
print('Kurtosis : ',pdata['MDVP:Flo(Hz)'].kurtosis())
sns.distplot(pdata['MDVP:Flo(Hz)'], kde= True, rug = True);
Skewness :  0.9105008789129816
Kurtosis :  -0.3380621656813876

After outlier treatment kurtosis indicates less number of data points around the tail

Five measures of variation in fundamental frequency

MDVP:Jitter(%)

In [38]:
print(pdata['MDVP:Jitter(%)'].head(10))
pdata['MDVP:Jitter(%)'].plot(kind='box');
0    0.00784
1    0.00968
2    0.01050
3    0.00997
4    0.01284
5    0.00968
6    0.00333
7    0.00290
8    0.00551
9    0.00532
Name: MDVP:Jitter(%), dtype: float64

More number of outliers present for MDVP:Jitter(%)

In [39]:
print("The minimum vocal fundamental frequency of a person is {:.2f} and \n 90% of the people have a Jitter of {:.2f}".format(pdata['MDVP:Jitter(%)'].mean(),pdata['MDVP:Jitter(%)'].quantile(0.90)))
The minimum vocal fundamental frequency of a person is 0.01 and 
 90% of the people have a Jitter of 0.01
In [40]:
print('Skewness :',pdata['MDVP:Jitter(%)'].skew())
print('Kurtosis :',pdata['MDVP:Jitter(%)'].kurtosis())
sns.distplot(pdata['MDVP:Jitter(%)'],kde= True , rug= True);
Skewness : 3.0849462014441817
Kurtosis : 12.030939276179508

The skweness value is positive hence the data is skewed towards right side

The Kurtosis value is positive hence more data points around the tail

In [41]:
#Outlier Treatment
q3 = pdata['MDVP:Jitter(%)'].quantile(0.75)
q1 = pdata['MDVP:Jitter(%)'].quantile(0.25)
iqr = q3-q1
out_above = q3+iqr
out_below = q1-iqr
print("outliers_above : {}".format(out_above))
print("outliers_below : {}".format(out_below))
outliers_above : 0.01127
outliers_below : -0.0004450000000000001
In [42]:
print("Total observations above outlier",pdata['MDVP:Jitter(%)'].loc[pdata['MDVP:Jitter(%)']>out_above].count())
print("Total observations below outlier",pdata['MDVP:Jitter(%)'].loc[pdata['MDVP:Jitter(%)']<out_below].count())
print("Data points above Outlier limit")
print(pdata['MDVP:Jitter(%)'].loc[pdata['MDVP:Jitter(%)']>out_above])
Total observations above outlier 16
Total observations below outlier 0
Data points above Outlier limit
4      0.01284
97     0.01280
98     0.01378
99     0.01936
100    0.03316
101    0.01551
102    0.03011
146    0.01568
147    0.01466
148    0.01719
149    0.01627
150    0.01872
151    0.03107
152    0.02714
157    0.01813
192    0.01360
Name: MDVP:Jitter(%), dtype: float64
In [43]:
max_val = pdata['MDVP:Jitter(%)'].loc[pdata['MDVP:Jitter(%)']<=out_above].max()
pdata['MDVP:Jitter(%)'] = pdata['MDVP:Jitter(%)'].mask(pdata['MDVP:Jitter(%)']>out_above,max_val)
print("After outlier Treatment")
print(pdata['MDVP:Jitter(%)'].head(10))
print(pdata['MDVP:Jitter(%)'].plot(kind='box'));
After outlier Treatment
0    0.00784
1    0.00968
2    0.01050
3    0.00997
4    0.01101
5    0.00968
6    0.00333
7    0.00290
8    0.00551
9    0.00532
Name: MDVP:Jitter(%), dtype: float64
AxesSubplot(0.08,0.125;0.87x0.755)
In [44]:
print('Skewness : ',pdata['MDVP:Jitter(%)'].skew())
print('Kurtosis : ',pdata['MDVP:Jitter(%)'].kurtosis())
sns.distplot(pdata['MDVP:Jitter(%)'],kde= True , rug = True);
Skewness :  0.7135316854634219
Kurtosis :  -0.45799448113050856

After outlier treatment kurtosis indicates less number of data points around the tail

MDVP:Jitter(Abs)

In [45]:
print(pdata['MDVP:Jitter(Abs)'].head(10))
pdata['MDVP:Jitter(Abs)'].plot(kind='box');
0    0.00007
1    0.00008
2    0.00009
3    0.00009
4    0.00011
5    0.00008
6    0.00003
7    0.00003
8    0.00006
9    0.00006
Name: MDVP:Jitter(Abs), dtype: float64
In [46]:
print('Skewness : ',pdata['MDVP:Jitter(Abs)'].skew())
print('kurtosis : ',pdata['MDVP:Jitter(Abs)'].kurtosis())
sns.distplot(pdata['MDVP:Jitter(Abs)'],kde = True, rug =True);
Skewness :  2.6490714165257274
kurtosis :  10.869042517763667
In [47]:
#Outlier Treatment
q3 = pdata['MDVP:Jitter(Abs)'].quantile(0.75)
q1 = pdata['MDVP:Jitter(Abs)'].quantile(0.25)
iqr = q3-q1
out_above = q3+iqr
out_below = q1-iqr
print("outliers_above : {}".format(out_above))
print("outliers_below : {}".format(out_below))
outliers_above : 9.999999999999999e-05
outliers_below : -1.9999999999999995e-05
In [48]:
print("Total observations above outlier",pdata['MDVP:Jitter(Abs)'].loc[pdata['MDVP:Jitter(Abs)']>out_above].count())
print("Total observations below outlier",pdata['MDVP:Jitter(Abs)'].loc[pdata['MDVP:Jitter(Abs)']<out_below].count())
print("Data points above Outlier limit")
print(pdata['MDVP:Jitter(Abs)'].loc[pdata['MDVP:Jitter(Abs)']>out_above])
Total observations above outlier 12
Total observations below outlier 0
Data points above Outlier limit
4      0.00011
79     0.00010
97     0.00010
98     0.00011
99     0.00015
100    0.00026
101    0.00012
102    0.00022
150    0.00010
151    0.00016
152    0.00014
157    0.00015
Name: MDVP:Jitter(Abs), dtype: float64
In [49]:
mean_val = pdata['MDVP:Jitter(Abs)'].loc[pdata['MDVP:Jitter(Abs)']<=out_above].mean()
pdata['MDVP:Jitter(Abs)'] = pdata['MDVP:Jitter(Abs)'].mask(pdata['MDVP:Jitter(Abs)']>out_above,mean_val)
print("After Outlier Treatment")
print(pdata['MDVP:Jitter(Abs)'].head(10))
print(pdata['MDVP:Jitter(Abs)'].plot(kind='box'));
After Outlier Treatment
0    0.000070
1    0.000080
2    0.000090
3    0.000090
4    0.000037
5    0.000080
6    0.000030
7    0.000030
8    0.000060
9    0.000060
Name: MDVP:Jitter(Abs), dtype: float64
AxesSubplot(0.08,0.125;0.87x0.755)
In [50]:
print('skewness : ',pdata['MDVP:Jitter(Abs)'].skew())
print('Kurtosis : ',pdata['MDVP:Jitter(Abs)'].kurtosis())
sns.distplot(pdata['MDVP:Jitter(Abs)'],kde = True ,rug= True);
skewness :  0.7300925097869511
Kurtosis :  0.008361144000160525

After outlier treatment kurtosis indicates less number of data points around the tail

MDVP:RAP

In [51]:
pdata['MDVP:RAP'].plot(kind='box');
In [52]:
print('Skewness : ',pdata['MDVP:RAP'].skew())
print('Kurtosis : ',pdata['MDVP:RAP'].kurtosis())
sns.distplot(pdata['MDVP:RAP'],kde=True , rug =True);
Skewness :  3.360708450480554
Kurtosis :  14.213797721522418

The skewness value is positive hence the data is skewed towards right side

The kurtosis value is positive hence more data points are around the tail

In [53]:
#Outlier Treatment
q3 = pdata['MDVP:RAP'].quantile(0.75)
q1 = pdata['MDVP:RAP'].quantile(0.25)
iqr = q3-q1
out_above = q3+iqr
out_below = q1-iqr
print("outliers_above : {}".format(out_above))
print("outliers_below : {}".format(out_below))
outliers_above : 0.00601
outliers_below : -0.0005149999999999996
In [54]:
print("Total observations above outlier",pdata['MDVP:RAP'].loc[pdata['MDVP:RAP']>out_above].count())
print("Total observations below outlier",pdata['MDVP:RAP'].loc[pdata['MDVP:RAP']<out_below].count())
print("Data points above Outlier limit")
print(pdata['MDVP:RAP'].loc[pdata['MDVP:RAP']>out_above])
Total observations above outlier 18
Total observations below outlier 0
Data points above Outlier limit
4      0.00655
68     0.00647
79     0.00622
97     0.00743
98     0.00826
99     0.01159
100    0.02144
101    0.00905
102    0.01854
146    0.00863
147    0.00849
148    0.00996
149    0.00919
150    0.01075
151    0.01800
152    0.01568
157    0.01117
192    0.00624
Name: MDVP:RAP, dtype: float64
In [55]:
max_val = pdata['MDVP:RAP'].loc[pdata['MDVP:RAP']<=out_above].max()
pdata['MDVP:RAP'] = pdata['MDVP:RAP'].mask(pdata['MDVP:RAP']>out_above,max_val)
print("After Outlier Treatment")
print(pdata['MDVP:RAP'].head(10))
print(pdata['MDVP:RAP'].plot(kind='box'));
After Outlier Treatment
0    0.00370
1    0.00465
2    0.00544
3    0.00502
4    0.00593
5    0.00463
6    0.00155
7    0.00144
8    0.00293
9    0.00268
Name: MDVP:RAP, dtype: float64
AxesSubplot(0.08,0.125;0.87x0.755)
In [56]:
print('Skewness : ',pdata['MDVP:RAP'].skew())
print('Kurtosis : ',pdata['MDVP:RAP'].kurtosis())
sns.distplot(pdata['MDVP:RAP'],kde=True , rug =True);
Skewness :  0.7370769342317647
Kurtosis :  -0.5062984793572727

After outlier treatment kurtosis indicates less number of data points around the tail

MDVP:PPQ

In [57]:
pdata['MDVP:PPQ'].plot(kind='box');

More number of Outliers present

In [58]:
print('Skewness : ',pdata['MDVP:PPQ'].skew())
print('Kurtosis : ',pdata['MDVP:PPQ'].kurtosis())
sns.distplot(pdata['MDVP:PPQ'],kde=True , rug =True);
Skewness :  3.073892457888517
Kurtosis :  11.963922120220282

The skewnessvalue is positive hence the data is skewed towards right side

The kurtosis value is positive hence more data points are around the tail

In [59]:
#Outlier Treatment
q3 = pdata['MDVP:PPQ'].quantile(0.75)
q1 = pdata['MDVP:PPQ'].quantile(0.25)
iqr = q3-q1
out_above = q3+iqr
out_below = q1-iqr
print("outliers_above : {}".format(out_above))
print("outliers_below : {}".format(out_below))
outliers_above : 0.00605
outliers_below : -0.00023499999999999997
In [60]:
print("Total observations above outlier",pdata['MDVP:PPQ'].loc[pdata['MDVP:PPQ']>out_above].count())
print("Total observations below outlier",pdata['MDVP:PPQ'].loc[pdata['MDVP:PPQ']<out_below].count())
print("Data points above Outlier limit")
print(pdata['MDVP:PPQ'].loc[pdata['MDVP:PPQ']>out_above])
Total observations above outlier 19
Total observations below outlier 0
Data points above Outlier limit
1      0.00696
2      0.00781
3      0.00698
4      0.00908
5      0.00750
97     0.00623
98     0.00655
99     0.00990
100    0.01522
101    0.00909
102    0.01628
146    0.00946
147    0.00819
148    0.01027
149    0.00963
150    0.01154
151    0.01958
152    0.01699
157    0.00718
Name: MDVP:PPQ, dtype: float64
In [61]:
max_val = pdata['MDVP:PPQ'].loc[pdata['MDVP:PPQ']<=out_above].max()
pdata['MDVP:PPQ'] = pdata['MDVP:PPQ'].mask(pdata['MDVP:PPQ']>out_above,max_val)
print("After Outlier Treatment")
print(pdata['MDVP:PPQ'].head(10))
print(pdata['MDVP:PPQ'].plot(kind='box'));
After Outlier Treatment
0    0.00554
1    0.00576
2    0.00576
3    0.00576
4    0.00576
5    0.00576
6    0.00202
7    0.00182
8    0.00332
9    0.00332
Name: MDVP:PPQ, dtype: float64
AxesSubplot(0.08,0.125;0.87x0.755)
In [62]:
print('Skewness : ',pdata['MDVP:PPQ'].skew())
print('Kurtosis : ',pdata['MDVP:PPQ'].kurtosis())
sns.distplot(pdata['MDVP:PPQ'],kde=True , rug =True);
Skewness :  0.6406137677940903
Kurtosis :  -0.6941626966478127

After outlier treatment kurtosis indicates less number of data points around the tail

Jitter:DDP

In [63]:
pdata['Jitter:DDP'].plot(kind='box');
In [64]:
print('Skewness : ',pdata['Jitter:DDP'].skew())
print('Kurtosis : ',pdata['Jitter:DDP'].kurtosis())
sns.distplot(pdata['Jitter:DDP'],kde=True , rug =True);
Skewness :  3.3620584478857203
Kurtosis :  14.224761911379424

The skewness value is positive hence the data is skewed towards right side

The kurtosis value is postive hence more data points are around the tail

In [65]:
#Outlier Treatment
q3 = pdata['Jitter:DDP'].quantile(0.75)
q1 = pdata['Jitter:DDP'].quantile(0.25)
iqr = q3-q1
out_above = q3+iqr
out_below = q1-iqr
print("outliers_above : {}".format(out_above))
print("outliers_below : {}".format(out_below))
outliers_above : 0.018025
outliers_below : -0.001535
In [66]:
print("Total observations above outlier",pdata['Jitter:DDP'].loc[pdata['Jitter:DDP']>out_above].count())
print("Total observations below outlier",pdata['Jitter:DDP'].loc[pdata['Jitter:DDP']<out_below].count())
print("Data points above Outlier limit")
print(pdata['Jitter:DDP'].loc[pdata['Jitter:DDP']>out_above])
Total observations above outlier 18
Total observations below outlier 0
Data points above Outlier limit
4      0.01966
68     0.01941
79     0.01865
97     0.02228
98     0.02478
99     0.03476
100    0.06433
101    0.02716
102    0.05563
146    0.02589
147    0.02546
148    0.02987
149    0.02756
150    0.03225
151    0.05401
152    0.04705
157    0.03351
192    0.01873
Name: Jitter:DDP, dtype: float64
In [67]:
max_val = pdata['Jitter:DDP'].loc[pdata['Jitter:DDP']<=out_above].max()
pdata['Jitter:DDP'] = pdata['Jitter:DDP'].mask(pdata['Jitter:DDP']>out_above,max_val)
print("After Outlier Treatment")
print(pdata['Jitter:DDP'].head(10))
print(pdata['Jitter:DDP'].plot(kind='box'));
After Outlier Treatment
0    0.01109
1    0.01394
2    0.01633
3    0.01505
4    0.01778
5    0.01388
6    0.00466
7    0.00431
8    0.00880
9    0.00803
Name: Jitter:DDP, dtype: float64
AxesSubplot(0.08,0.125;0.87x0.755)
In [68]:
print('Skewness : ',pdata['Jitter:DDP'].skew())
print('Kurtosis : ',pdata['Jitter:DDP'].kurtosis())
sns.distplot(pdata['Jitter:DDP'],kde=True , rug =True);
Skewness :  0.7360596525004133
Kurtosis :  -0.5082945859294927

After outlier treatment kurtosis indicates less number of data points around the tail

In [69]:
#Analysis of Shimmer
affected_MDVP = pdata[pdata['status']==1]['MDVP:Shimmer(dB)'].values
not_affected_MDVP = pdata[pdata['status']==0]['MDVP:Shimmer(dB)'].values
sns.distplot(affected_MDVP);
plt.title('Shimmer values for affected cases')
plt.xlabel('Shimmer values in DB per affected cases')
plt.show()
sns.boxplot(affected_MDVP);
plt.title('Shimmer values for affected cases')
plt.xlabel('Shimmer values in DB per affected cases')
plt.show()
sns.distplot(not_affected_MDVP);
plt.title('Shimmer values for not affected cases')
plt.xlabel('Shimmer values in DB per not affected cases')
plt.show()
sns.boxplot(not_affected_MDVP);
plt.title('Shimmer values for not affected cases')
plt.xlabel('Shimmer values in DB per not affected cases')
plt.show()
sns.FacetGrid(pdata, hue="status", size=5).map(sns.distplot, "MDVP:Shimmer(dB)").add_legend();
plt.show()

Three nonlinear measures of fundamental frequency variation

Spread1

In [70]:
pdata['spread1'].plot(kind='box');
In [71]:
print('Skewness : ',pdata['spread1'].skew())
print('Kurtosis : ',pdata['spread1'].kurtosis())
sns.distplot(pdata['spread1'],kde=True , rug =True);
Skewness :  0.4321389320131796
Kurtosis :  -0.05019918161280801

Spread2

In [72]:
pdata['spread2'].plot(kind='box');
In [73]:
print('Skewness : ',pdata['spread2'].skew())
print('Kurtosis : ',pdata['spread2'].kurtosis())
sns.distplot(pdata['spread2'],kde=True , rug =True);
Skewness :  0.14443048549278412
Kurtosis :  -0.08302289327680024

PPE

In [74]:
pdata['PPE'].plot(kind='box');
In [75]:
print('Skewness : ',pdata['PPE'].skew())
print('Kurtosis : ',pdata['PPE'].kurtosis())
sns.distplot(pdata['PPE'],kde=True , rug =True);
Skewness :  0.7974910716463578
Kurtosis :  0.5283349472852588

Target Column - Status

In [76]:
#status - Health status of the subject (one) - Parkinson's, (zero) - healthy
pd.crosstab(pdata['status'],columns='count')
Out[76]:
col_0 count
status
0 48
1 147
In [77]:
#Target Column Distribution
sns.countplot(pdata['status']);

From the status (target column distribution), high number of patients affected by Parkinson Disease this would be an effective screening step prior to an appointment with a clinician.

Bivariate Analysis

In [78]:
#Bivaraiate Analysis to determine the relationship between independent attribute and target column
for i in pdata:
    if i != 'status' and i != 'name':
        sns.catplot(x="status",y=i,kind ='box',data=pdata);

it is very clear that if a patient has a lower rate of 'HNR','MDVP:Flo(Hz)','MDVP:Fhi(Hz)','MDVP:Fo(Hz)' ,then Patient is affected by parkinsons disease.

In [90]:
#Bivariate Distribution of Target column (Status) with respect to all other Independent Numeric attributes
#Using Scatter Plot
plt.figure(figsize=(10,20))

plt.subplot(6,1,1)
sns.scatterplot(pdata['MDVP:Fo(Hz)'],pdata['MDVP:Fhi(Hz)'], hue = pdata['status'], palette= ['red','blue']);

plt.subplot(6,1,2)
sns.scatterplot(pdata['MDVP:Fo(Hz)'],pdata['MDVP:Flo(Hz)'] , hue = pdata['status'], palette= ['blue','green']);

plt.subplot(6,1,3)
sns.scatterplot(pdata['MDVP:Jitter(%)'], pdata['MDVP:Jitter(Abs)'], hue =pdata['status'], palette= ['green','yellow']);

plt.subplot(6,1,4)
sns.scatterplot(pdata['MDVP:RAP'],pdata['MDVP:PPQ'], hue = pdata['status'], palette= ['green','red']);

plt.subplot(6,1,5)
sns.scatterplot(pdata['NHR'],pdata['HNR'], hue = pdata['status'], palette= ['magenta','yellow']);

plt.subplot(6,1,6)
sns.scatterplot(pdata['RPDE'],pdata['D2'], hue = pdata['status'], palette= ['cyan','blue']);
In [98]:
plt.figure(figsize=(10,20))

plt.subplot(2,1,1)
sns.scatterplot(pdata['spread1'],pdata['PPE'], hue = pdata['status'], palette= ['red','blue']);

plt.subplot(2,1,2)
sns.scatterplot(pdata['spread2'],pdata['PPE'] , hue = pdata['status'], palette= ['blue','green']);
    

While seeing the relationship between the nonlinear measures of fundamental frequency attributes,

PPE -Spread1 and PPE -Spread2: it shows that the Patients highly affected by Parkinsons Disease.

PPE is the most important attribute to predict the target class (Status)

In [407]:
#Pair plot which shows the bivaraiate distribution using scatter plot and univaraite distribution using Histograms.
sns.pairplot(pdata, hue = "status",diag_kind="kde");

Correlation Matrix

In [103]:
#Use correlation method to observe the relationship between different attributes.
#Apply HeatMap to check the relationship between different attributes.
plt.figure(figsize=(10,8))
sns.heatmap(pdata.corr(),
            annot=True,
            linewidths=.5,
            center=0,
            cbar=False,
            cmap="YlGnBu");
plt.show()

After outlier Treatment

In [104]:
pdata.kurtosis(numeric_only  = True)
Out[104]:
MDVP:Fo(Hz)         -0.627898
MDVP:Fhi(Hz)        -1.046282
MDVP:Flo(Hz)        -0.338062
MDVP:Jitter(%)      -0.457994
MDVP:Jitter(Abs)     0.008361
MDVP:RAP            -0.506298
MDVP:PPQ            -0.694163
Jitter:DDP          -0.508295
MDVP:Shimmer         3.238308
MDVP:Shimmer(dB)     5.128193
Shimmer:APQ3         2.720152
Shimmer:APQ5         3.874210
MDVP:APQ            11.163288
Shimmer:DDA          2.720661
NHR                 21.994974
HNR                  0.616036
RPDE                -0.921781
DFA                 -0.686152
spread1             -0.050199
spread2             -0.083023
D2                   0.220334
PPE                  0.528335
status              -0.595518
dtype: float64
In [108]:
pdata.skew(numeric_only  = True)
Out[108]:
MDVP:Fo(Hz)         0.591737
MDVP:Fhi(Hz)        0.298456
MDVP:Flo(Hz)        0.910501
MDVP:Jitter(%)      0.713532
MDVP:Jitter(Abs)    0.730093
MDVP:RAP            0.737077
MDVP:PPQ            0.640614
Jitter:DDP          0.736060
MDVP:Shimmer        1.666480
MDVP:Shimmer(dB)    1.999389
Shimmer:APQ3        1.580576
Shimmer:APQ5        1.798697
MDVP:APQ            2.618047
Shimmer:DDA         1.580618
NHR                 4.220709
HNR                -0.514317
RPDE               -0.143402
DFA                -0.033214
spread1             0.432139
spread2             0.144430
D2                  0.430384
PPE                 0.797491
status             -1.187727
dtype: float64

4. Split the dataset into training and test set in the ratio of 70:30 (Training:Test)

In [120]:
from sklearn.model_selection import train_test_split


# Setting Independent features
X = pdata.drop(['status','name'], axis = 1)
#Set Target class label
y = pdata['status']

# Splitting the data into training and test set in the ratio of 70:30 respectively
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = random_state)
X_train.head()
Out[120]:
MDVP:Fo(Hz) MDVP:Fhi(Hz) MDVP:Flo(Hz) MDVP:Jitter(%) MDVP:Jitter(Abs) MDVP:RAP MDVP:PPQ Jitter:DDP MDVP:Shimmer MDVP:Shimmer(dB) Shimmer:APQ3 Shimmer:APQ5 MDVP:APQ Shimmer:DDA NHR HNR RPDE DFA spread1 spread2 D2 PPE
38 180.198 201.249 175.456 0.00284 0.00002 0.00153 0.00166 0.00459 0.01444 0.131 0.00726 0.00885 0.01190 0.02177 0.00231 26.738 0.403884 0.766209 -6.452058 0.212294 2.269398 0.141929
31 199.228 209.512 192.091 0.00241 0.00001 0.00134 0.00138 0.00402 0.01015 0.089 0.00504 0.00641 0.00762 0.01513 0.00167 30.940 0.432439 0.742055 -7.682587 0.173319 2.103106 0.068501
173 113.715 116.443 96.913 0.00349 0.00003 0.00171 0.00203 0.00514 0.01472 0.133 0.00748 0.00905 0.01148 0.02245 0.00478 26.547 0.380253 0.766700 -5.943501 0.192150 1.852542 0.179677
12 136.926 159.866 131.276 0.00293 0.00002 0.00118 0.00153 0.00355 0.01259 0.112 0.00656 0.00717 0.01140 0.01968 0.00581 25.703 0.460600 0.646846 -6.547148 0.152813 2.041277 0.138512
109 193.030 208.900 80.297 0.00766 0.00004 0.00450 0.00389 0.01351 0.03044 0.275 0.01771 0.01815 0.02084 0.05312 0.00947 21.934 0.497554 0.740539 -5.845099 0.278679 2.608749 0.185668
In [121]:
#Display Target column's train data
pd.crosstab(y_train,columns='count',colnames=['Train data'])
Out[121]:
Train data count
status
0 33
1 103
In [122]:
# Shape and size of Training dataset
print("Training data size\n",X_train.shape,y_train.shape)
Training data size
 (136, 22) (136,)
In [123]:
#Display the Independent features test dataset
X_test.head()
Out[123]:
MDVP:Fo(Hz) MDVP:Fhi(Hz) MDVP:Flo(Hz) MDVP:Jitter(%) MDVP:Jitter(Abs) MDVP:RAP MDVP:PPQ Jitter:DDP MDVP:Shimmer MDVP:Shimmer(dB) Shimmer:APQ3 Shimmer:APQ5 MDVP:APQ Shimmer:DDA NHR HNR RPDE DFA spread1 spread2 D2 PPE
138 112.239 126.609000 104.095 0.00472 0.00004 0.00238 0.00290 0.00715 0.05643 0.517 0.03070 0.03530 0.04451 0.09211 0.02629 17.366 0.640945 0.701404 -5.634576 0.306014 2.419253 0.209191
16 144.188 177.414634 82.764 0.00544 0.00004 0.00211 0.00292 0.00632 0.02047 0.192 0.00969 0.01200 0.02074 0.02908 0.01859 22.333 0.567380 0.644692 -5.440040 0.239764 2.264501 0.218164
155 117.870 127.349000 95.654 0.00647 0.00005 0.00356 0.00300 0.01067 0.03087 0.276 0.01659 0.01804 0.02402 0.04977 0.02631 22.431 0.628300 0.652025 -3.583722 0.207914 2.439597 0.206256
96 159.116 168.913000 144.811 0.00342 0.00002 0.00178 0.00184 0.00535 0.03381 0.307 0.01806 0.02024 0.02809 0.05417 0.00852 22.663 0.366329 0.693429 -6.417440 0.194627 2.473239 0.151709
68 143.533 162.215000 65.809 0.01101 0.00008 0.00593 0.00467 0.01778 0.05384 0.478 0.03152 0.02422 0.03392 0.09455 0.04882 20.338 0.513237 0.731444 -5.869750 0.151814 2.118496 0.185580
In [124]:
#Display Target column's Test data
pd.crosstab(y_test,columns='count',colnames=['Test data'])
Out[124]:
Test data count
status
0 15
1 44
In [126]:
#Print Test data size

print("\nTesting data size\n",X_test.shape,y_test.shape)
Testing data size
 (59, 22) (59,)
In [127]:
#check split of dataset
print("{0:0.2f}% data is in training set".format((len(X_train)/len(pdata.index)) * 100))
print("{0:0.2f}% data is in test set".format((len(X_test)/len(pdata.index)) * 100))
69.74% data is in training set
30.26% data is in test set
In [128]:
#Detailed Summary count of Original, Train and Test DataSet
print(" Parkinsons Disease Affected Person count: {0} ({1:0.2f}%)".format(len(pdata.loc[pdata['status'] == 1]), (len(pdata.loc[pdata['status'] == 1])/len(pdata.index)) * 100))
print("Parkinsons Disease not affected Person Count : {0} ({1:0.2f}%)".format(len(pdata.loc[pdata['status'] == 0]), (len(pdata.loc[pdata['status'] == 0])/len(pdata.index)) * 100))
print("")
print("Training data- Parkinsons Disease affected Person count   : {0} ({1:0.2f}%)".format(len(y_train[y_train[:] == 1]), (len(y_train[y_train[:] == 1])/len(y_train)) * 100))
print("Training data -Parkinsons Disease not affected Person count   : {0} ({1:0.2f}%)".format(len(y_train[y_train[:] == 0]), (len(y_train[y_train[:] == 0])/len(y_train)) * 100))
print("")
print("Testing data- Parkinsons Disease affected Person count: {0} ({1:0.2f}%)".format(len(y_test[y_test[:] == 1]), (len(y_test[y_test[:] == 1])/len(y_test)) * 100))
print("Testing data- Parkinsons Disease not affected Person count : {0} ({1:0.2f}%)".format(len(y_test[y_test[:] == 0]), (len(y_test[y_test[:] == 0])/len(y_test)) * 100))
print("")
 Parkinsons Disease Affected Person count: 147 (75.38%)
Parkinsons Disease not affected Person Count : 48 (24.62%)

Training data- Parkinsons Disease affected Person count   : 103 (75.74%)
Training data -Parkinsons Disease not affected Person count   : 33 (24.26%)

Testing data- Parkinsons Disease affected Person count: 44 (74.58%)
Testing data- Parkinsons Disease not affected Person count : 15 (25.42%)

5. Prepare the data for training - Scale the data if necessary, get rid of missing values (if any)

In [180]:
# Applying RobustScaler method to make it less prone to outliers
from sklearn.preprocessing import RobustScaler  
features = X.columns
#RobustScaler() scales features using IQR that are robust to outliers
scaler = RobustScaler()
X = pd.DataFrame(scaler.fit_transform(X), columns = features)

# Scaling the independent variables
Xscale = X.apply(zscore)

display(X.shape, Xscale.shape, y.shape)
(195, 22)
(195, 22)
(195,)
In [408]:
#Apply Standard scaler method to Standardize features by removing the mean and scaling to unit variance
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

6. Train at least 3 standard classification algorithms - Logistic Regression, Naive Bayes’, SVM, k-NN etc, and note down their accuracies on the test data

Logisitic Regression

In [387]:
#Create Logistic Regression Model

LR = LogisticRegression(solver="liblinear")
LR.fit(X_train, y_train)
#predict target class on test data
y_pred = LR.predict(X_test)

accuracy_LR = accuracy_score(y_test, y_pred)

print('Training Score: ', LR.score(X_train, y_train).round(3))
print('Test Score: ', LR.score(X_test, y_test).round(3))
print('Classification Report of LR :')
print(classification_report(y_test,y_pred))

print('Accuracy: ', accuracy_LR.round(3))
#Print Confusion Matrix
cm_LR = metrics.confusion_matrix(y_test, y_pred)

label = ["Parkinsons Disease Affected", " Parkinsons Disease not Affected"]
cm1_LR = pd.DataFrame(cm_LR, index = label, columns = label)
sns.heatmap(cm1_LR, annot = True, fmt = "d")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

#Store the accuracy results for each model in a dataframe for final comparison
resultDf = pd.DataFrame({'Method':['Logistic Regression'], 'accuracy': [accuracy_LR] })
resultDf = resultDf[['Method', 'accuracy']]
resultDf
Training Score:  0.882
Test Score:  0.881
Classification Report of LR :
              precision    recall  f1-score   support

           0       0.83      0.67      0.74        15
           1       0.89      0.95      0.92        44

    accuracy                           0.88        59
   macro avg       0.86      0.81      0.83        59
weighted avg       0.88      0.88      0.88        59

Accuracy:  0.881
Out[387]:
Method accuracy
0 Logistic Regression 0.881356

Naive Bayes Model

In [388]:
#Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB

NB = GaussianNB()
NB.fit(X_train, y_train)
#predict target class on test data
y_pred = NB.predict(X_test)

accuracy_NB = accuracy_score(y_test, y_pred)

print('Training Score: ', NB.score(X_train, y_train).round(3))
print('Test Score: ', NB.score(X_test, y_test).round(3))
print('Classification Report of Gaussian Naive Bayes model :')
print(classification_report(y_test,y_pred))

print('Accuracy: ', accuracy_NB.round(3))
#Print Confusion Matrix
cm_NB = metrics.confusion_matrix(y_test, y_pred)

label = ["Parkinsons Disease Affected", " Parkinsons Disease not Affected"]
cm1_NB = pd.DataFrame(cm_NB, index = label, columns = label)
sns.heatmap(cm1_NB, annot = True, fmt = "d")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

tempResultDf = pd.DataFrame({'Method':['Naive Bayes'], 'accuracy': [accuracy_NB]})
resultDf = pd.concat([resultDf, tempResultDf])
resultDf = resultDf[['Method', 'accuracy']]
resultDf
Training Score:  0.684
Test Score:  0.797
Classification Report of Gaussian Naive Bayes model :
              precision    recall  f1-score   support

           0       0.58      0.73      0.65        15
           1       0.90      0.82      0.86        44

    accuracy                           0.80        59
   macro avg       0.74      0.78      0.75        59
weighted avg       0.82      0.80      0.80        59

Accuracy:  0.797
Out[388]:
Method accuracy
0 Logistic Regression 0.881356
0 Naive Bayes 0.796610

Support Vector Machine

In [389]:
#Build SVM Model
from sklearn.svm import SVC 
SVM = SVC(gamma=0.025, C=3)  
  
SVM.fit(X_train , y_train)
#predict target class on test data
y_pred = SVM.predict(X_test)

accuracy_SVM = accuracy_score(y_test, y_pred)

print('Training Score: ', SVM.score(X_train, y_train).round(3))
print('Test Score: ', SVM.score(X_test, y_test).round(3))
print('Classification Report of SVM model :')
print(classification_report(y_test,y_pred))

print('Accuracy: ', accuracy_SVM.round(3))

cm_SVM = metrics.confusion_matrix(y_test, y_pred)

label = ["Parkinsons Disease Affected", " Parkinsons Disease not Affected"]
cm1_SVM = pd.DataFrame(cm_SVM, index = label, columns = label)
sns.heatmap(cm1_SVM, annot = True, fmt = "d")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

tempResultDf = pd.DataFrame({'Method':['SVM'], 'accuracy': [accuracy_SVM]})
resultDf = pd.concat([resultDf, tempResultDf])
resultDf = resultDf[['Method', 'accuracy']]
resultDf
Training Score:  0.897
Test Score:  0.898
Classification Report of SVM model :
              precision    recall  f1-score   support

           0       0.91      0.67      0.77        15
           1       0.90      0.98      0.93        44

    accuracy                           0.90        59
   macro avg       0.90      0.82      0.85        59
weighted avg       0.90      0.90      0.89        59

Accuracy:  0.898
Out[389]:
Method accuracy
0 Logistic Regression 0.881356
0 Naive Bayes 0.796610
0 SVM 0.898305

k-Nearest Neighbor Classifier

In [390]:
from sklearn.neighbors import KNeighborsClassifier
#Build KNN model
KNN = KNeighborsClassifier()
KNN.fit(X_train, y_train)   
#predict target class on test data
y_pred = KNN.predict(X_test)

accuracy_KNN = accuracy_score(y_test, y_pred)

print('Training Score: ', KNN.score(X_train, y_train).round(3))
print('Test Score: ', KNN.score(X_test, y_test).round(3))
print('Classification Report of KNN Classifier Model :')
print(classification_report(y_test,y_pred))

print('Accuracy: ', accuracy_KNN.round(3))

cm_KNN = metrics.confusion_matrix(y_test, y_pred)

label = ["Parkinsons Disease Affected", " Parkinsons Disease not Affected"]
cm1_KNN = pd.DataFrame(cm_KNN, index = label, columns = label)
sns.heatmap(cm1_KNN, annot = True, fmt = "d")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

tempResultDf = pd.DataFrame({'Method':['KNN'], 'accuracy': [accuracy_KNN]})
resultDf = pd.concat([resultDf, tempResultDf])
resultDf = resultDf[['Method', 'accuracy']]
resultDf
Training Score:  0.963
Test Score:  0.932
Classification Report of KNN Classifier Model :
              precision    recall  f1-score   support

           0       0.92      0.80      0.86        15
           1       0.93      0.98      0.96        44

    accuracy                           0.93        59
   macro avg       0.93      0.89      0.91        59
weighted avg       0.93      0.93      0.93        59

Accuracy:  0.932
Out[390]:
Method accuracy
0 Logistic Regression 0.881356
0 Naive Bayes 0.796610
0 SVM 0.898305
0 KNN 0.932203

7. Train a meta-classifier and note the accuracy on test data

Decision Tree Classifier

In [391]:
#Build Decision Tree Classifier
#Prune the decision tree by limiting the max. depth of trees to avoid over-fitting
DT = DecisionTreeClassifier(criterion = "gini", random_state = random_state,max_depth=3, min_samples_leaf=5)
DT.fit(X_train, y_train)
#predict target class on test data
y_pred = DT.predict(X_test)
feature_cols = X.columns

accuracy_DT = accuracy_score(y_test, y_pred)

print('Training Score: ', DT.score(X_train, y_train).round(3))
print('Test Score: ', DT.score(X_test, y_test).round(3))
print('Classification Report of Decision Tree classifier :')
print(classification_report(y_test,y_pred))

print('Accuracy: ', accuracy_DT.round(3))

cm_DT = metrics.confusion_matrix(y_test, y_pred)

label = ["Parkinsons Disease Affected", " Parkinsons Disease not Affected"]
cm1_DT = pd.DataFrame(cm_DT, index = label, columns = label)
sns.heatmap(cm1_DT, annot = True, fmt = "d")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()


tempResultDf = pd.DataFrame({'Method':['Decision Tree'], 'accuracy': [accuracy_DT]})
resultDf = pd.concat([resultDf, tempResultDf])
resultDf = resultDf[['Method', 'accuracy']]
resultDf
Training Score:  0.89
Test Score:  0.847
Classification Report of Decision Tree classifier :
              precision    recall  f1-score   support

           0       0.69      0.73      0.71        15
           1       0.91      0.89      0.90        44

    accuracy                           0.85        59
   macro avg       0.80      0.81      0.80        59
weighted avg       0.85      0.85      0.85        59

Accuracy:  0.847
Out[391]:
Method accuracy
0 Logistic Regression 0.881356
0 Naive Bayes 0.796610
0 SVM 0.898305
0 KNN 0.932203
0 Decision Tree 0.847458
In [392]:
print('Feature Importance for Decision Tree ', '--'*38)
feature_importances = pd.DataFrame(DT.feature_importances_, index = X.columns, 
                                   columns=['Importance']).sort_values('Importance', ascending = True)
feature_importances.sort_values(by = 'Importance', ascending = True).plot(kind = 'barh', figsize = (15, 7.2))
Feature Importance for Decision Tree  ----------------------------------------------------------------------------
Out[392]:
<matplotlib.axes._subplots.AxesSubplot at 0xdc01eec408>

Single Decision Tree employed to find the accuracy score but it depends mostly on PPE attribute to predict the target class. It doesn't use the other attributes . Disadvanatage : Low Feature selection

Meta-Classifier

In [393]:
#Stacking is designed to improve modeling performance
#Train a meta-classifier
level0 = list()
level0.append(('LR', LR))
level0.append(('KNN', KNN ))
level0.append(('CART', DT ))
level0.append(('SVM', SVM ))
level0.append(('Naive Bayes', NB ))
# define meta learner model
#Classification Meta-Model: Use Logistic Regression.
level1 = LR
# define the stacking ensemble
model = StackingClassifier(estimators=level0, final_estimator=level1, cv=5)
# fit the model 
model.fit(X, y)
#predict Target class on test data in Meta Classifier Model
y_pred = model.predict(X_test)

accuracy_meta = accuracy_score(y_test, y_pred)

print('Training Score: ', model.score(X_train, y_train).round(3))
print('Test Score: ', model.score(X_test, y_test).round(3))
print('Classification Report of Meta Classifier model :')
print(classification_report(y_test,y_pred))

print('Accuracy: ', accuracy_meta.round(3))
cm_meta = metrics.confusion_matrix(y_test, y_pred)

label = ["Parkinsons Disease Affected", " Parkinsons Disease not Affected"]
cm1_meta = pd.DataFrame(cm_meta, index = label, columns = label)
sns.heatmap(cm1_meta, annot = True, fmt = "d")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()
tempResultDf = pd.DataFrame({'Method':['Meta Classifier'], 'accuracy': [accuracy_meta]})
resultDf = pd.concat([resultDf, tempResultDf])
resultDf = resultDf[['Method', 'accuracy']]
resultDf
Training Score:  0.912
Test Score:  0.898
Classification Report of Meta Classifier model :
              precision    recall  f1-score   support

           0       0.85      0.73      0.79        15
           1       0.91      0.95      0.93        44

    accuracy                           0.90        59
   macro avg       0.88      0.84      0.86        59
weighted avg       0.90      0.90      0.90        59

Accuracy:  0.898
Out[393]:
Method accuracy
0 Logistic Regression 0.881356
0 Naive Bayes 0.796610
0 SVM 0.898305
0 KNN 0.932203
0 Decision Tree 0.847458
0 Meta Classifier 0.898305

8. Train at least one standard Ensemble model - Random forest, Bagging, Boosting etc, and note the accuracy

Random Forest Model

In [394]:
#Apply the Random forest model and print the accuracy of Random forest Model
from sklearn.ensemble import RandomForestClassifier
RF = RandomForestClassifier(n_estimators = 100 ,random_state = random_state)
RF.fit(X_train, y_train)
#predict target class on test data
y_pred = RF.predict(X_test)

accuracy_RF = accuracy_score(y_test, y_pred)

print('Training Score: ', RF.score(X_train, y_train).round(3))
print('Test Score: ', RF.score(X_test, y_test).round(3))
print('Classification Report of Random Forest model :')
print(classification_report(y_test,y_pred))

print('Accuracy: ', accuracy_RF.round(3))

cm_RF = metrics.confusion_matrix(y_test, y_pred)

label = ["Parkinsons Disease Affected", " Parkinsons Disease not Affected"]
cm1_RF = pd.DataFrame(cm_RF, index = label, columns = label)
sns.heatmap(cm1_RF, annot = True, fmt = "d")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

tempResultDf = pd.DataFrame({'Method':['Random Forest'], 'accuracy': [accuracy_RF]})
resultDf = pd.concat([resultDf, tempResultDf])
resultDf = resultDf[['Method', 'accuracy']]
resultDf
Training Score:  1.0
Test Score:  0.932
Classification Report of Random Forest model :
              precision    recall  f1-score   support

           0       0.92      0.80      0.86        15
           1       0.93      0.98      0.96        44

    accuracy                           0.93        59
   macro avg       0.93      0.89      0.91        59
weighted avg       0.93      0.93      0.93        59

Accuracy:  0.932
Out[394]:
Method accuracy
0 Logistic Regression 0.881356
0 Naive Bayes 0.796610
0 SVM 0.898305
0 KNN 0.932203
0 Decision Tree 0.847458
0 Meta Classifier 0.898305
0 Random Forest 0.932203
In [395]:
print('Feature Importance for Random Forest Classifier ', '--'*38)
feature_importances = pd.DataFrame(RF.feature_importances_, index = X.columns, 
                                   columns=['Importance']).sort_values('Importance', ascending = True)
feature_importances.sort_values(by = 'Importance', ascending = True).plot(kind = 'barh', figsize = (15, 7.2))
Feature Importance for Random Forest Classifier  ----------------------------------------------------------------------------
Out[395]:
<matplotlib.axes._subplots.AxesSubplot at 0xdc01e28ec8>

Random forest which is Ensemble of Decision Trees employed to find the best accuracy score but it depends mostly on PPE attribute to predict the target class.

This uses all the independent attributes to predict the target class but PPE has the highest feature importance.

Advanatage : High Feature selection

Bagging Model

In [396]:
from sklearn.ensemble import BaggingClassifier

BAG = BaggingClassifier(n_estimators=50, max_samples= .7, bootstrap=True, oob_score=True, random_state=22)
BAG.fit(X_train, y_train)
#predict target class on test data
y_pred = BAG.predict(X_test)

accuracy_BAG = accuracy_score(y_test, y_pred)

print('Training Score: ', BAG.score(X_train, y_train).round(3))
print('Test Score: ', BAG.score(X_test, y_test).round(3))
print('Classification Report of Bagging Model:')
print(classification_report(y_test,y_pred))

print('Accuracy: ', accuracy_BAG.round(3))

cm_BAG = metrics.confusion_matrix(y_test, y_pred)

label = ["Parkinsons Disease Affected", " Parkinsons Disease not Affected"]
cm1_BAG = pd.DataFrame(cm_BAG, index = label, columns = label)
sns.heatmap(cm1_BAG, annot = True, fmt = "d")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

tempResultDf = pd.DataFrame({'Method':['Bagging'], 'accuracy': [accuracy_BAG]})
resultDf = pd.concat([resultDf, tempResultDf])
resultDf = resultDf[['Method', 'accuracy']]
resultDf
Training Score:  0.993
Test Score:  0.915
Classification Report of Bagging Model:
              precision    recall  f1-score   support

           0       1.00      0.67      0.80        15
           1       0.90      1.00      0.95        44

    accuracy                           0.92        59
   macro avg       0.95      0.83      0.87        59
weighted avg       0.92      0.92      0.91        59

Accuracy:  0.915
Out[396]:
Method accuracy
0 Logistic Regression 0.881356
0 Naive Bayes 0.796610
0 SVM 0.898305
0 KNN 0.932203
0 Decision Tree 0.847458
0 Meta Classifier 0.898305
0 Random Forest 0.932203
0 Bagging 0.915254

Boosting Model

AdaBoost Classifier

In [397]:
# Apply Adaboost Ensemble Algorithm and print the accuracy.
from sklearn.ensemble import AdaBoostClassifier
ADABOOST = AdaBoostClassifier(n_estimators= 50, learning_rate=0.1, random_state=random_state)
ADABOOST.fit(X_train, y_train)
#predict target class on test data
y_pred = ADABOOST.predict(X_test)

accuracy_ADABOOST = accuracy_score(y_test, y_pred)

print('Training Score: ', ADABOOST.score(X_train, y_train).round(3))
print('Test Score: ', ADABOOST.score(X_test, y_test).round(3))
print('Classification Report of ADABOOST Ensemble Model:')
print(classification_report(y_test,y_pred))

print('Accuracy: ', accuracy_ADABOOST.round(3))

cm_ADABOOST = metrics.confusion_matrix(y_test, y_pred)

label = ["Parkinsons Disease Affected", " Parkinsons Disease not Affected"]
cm1_ADABOOST = pd.DataFrame(cm_ADABOOST, index = label, columns = label)
sns.heatmap(cm1_ADABOOST, annot = True, fmt = "d")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

tempResultDf = pd.DataFrame({'Method':['ADABOOST'], 'accuracy': [accuracy_ADABOOST]})
resultDf = pd.concat([resultDf, tempResultDf])
resultDf = resultDf[['Method', 'accuracy']]
resultDf
Training Score:  0.971
Test Score:  0.881
Classification Report of ADABOOST Ensemble Model:
              precision    recall  f1-score   support

           0       0.90      0.60      0.72        15
           1       0.88      0.98      0.92        44

    accuracy                           0.88        59
   macro avg       0.89      0.79      0.82        59
weighted avg       0.88      0.88      0.87        59

Accuracy:  0.881
Out[397]:
Method accuracy
0 Logistic Regression 0.881356
0 Naive Bayes 0.796610
0 SVM 0.898305
0 KNN 0.932203
0 Decision Tree 0.847458
0 Meta Classifier 0.898305
0 Random Forest 0.932203
0 Bagging 0.915254
0 ADABOOST 0.881356

GradientBoost Classifier

In [398]:
#Apply GradientBoost Classifier Algorithm and print the accuracy
from sklearn.ensemble import GradientBoostingClassifier
GB = GradientBoostingClassifier(n_estimators = 20, random_state=random_state)
GB.fit(X_train, y_train)
#predict target class on test data
y_pred =GB.predict(X_test)

accuracy_GB = accuracy_score(y_test, y_pred)

print('Training Score: ', GB.score(X_train, y_train).round(3))
print('Test Score: ', GB.score(X_test, y_test).round(3))
print('Classification Report of Gradient Boosting Classifier Model :')
print(classification_report(y_test,y_pred))

print('Accuracy: ', accuracy_GB.round(3))

cm_GB = metrics.confusion_matrix(y_test, y_pred)

label = ["Parkinsons Disease Affected", " Parkinsons Disease not Affected"]
cm1_GB = pd.DataFrame(cm_GB, index = label, columns = label)
sns.heatmap(cm1_GB, annot = True, fmt = "d")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

tempResultDf = pd.DataFrame({'Method':['Gradient Boosting'], 'accuracy': [accuracy_GB]})
resultDf = pd.concat([resultDf, tempResultDf])
resultDf = resultDf[['Method', 'accuracy']]
resultDf
Training Score:  1.0
Test Score:  0.915
Classification Report of Gradient Boosting Classifier Model :
              precision    recall  f1-score   support

           0       1.00      0.67      0.80        15
           1       0.90      1.00      0.95        44

    accuracy                           0.92        59
   macro avg       0.95      0.83      0.87        59
weighted avg       0.92      0.92      0.91        59

Accuracy:  0.915
Out[398]:
Method accuracy
0 Logistic Regression 0.881356
0 Naive Bayes 0.796610
0 SVM 0.898305
0 KNN 0.932203
0 Decision Tree 0.847458
0 Meta Classifier 0.898305
0 Random Forest 0.932203
0 Bagging 0.915254
0 ADABOOST 0.881356
0 Gradient Boosting 0.915254

9. Compare all the models (minimum 5) and pick the best one among them

In [404]:
best_model = []
best_model.append(('Logisitic Regression', LR ))
best_model.append(('Naive Bayes', NB ))
best_model.append(('SVM', SVM ))
best_model.append(('Decision Tree', DT ))
best_model.append(('Meta-Classifier',model ))
best_model.append(('Random Forest', RF ))
best_model.append(('Bagging', BAG))
best_model.append(('AdaBoost', ADABOOST ))
best_model.append(('GradientBoost', GB ))


# Evaluate each model 
output = []
identifier = []
Best_scoring = 'accuracy'
for name, model in best_model:
# Perform k-fold Cross-Validation to evaluate the Performance metrics of all Classification Models
    from sklearn import model_selection
    kfold = model_selection.KFold(n_splits=3)
    cv_output = model_selection.cross_val_score(model, X, y, cv=kfold, scoring=Best_scoring)
    output.append(cv_output)
    identifier.append(name)
    result = "%s: %f " % (name, cv_output.max())
    print(result)
    
# Using Box plot to find the Best Model 
fig = plt.figure(figsize=(15,15))
ax = fig.add_subplot(111)
plt.title("Model comparison of all standard classification and Ensemble Model")
plt.boxplot(output);
ax.set_xticklabels(identifier)
plt.show()
Logisitic Regression: 0.846154 
Naive Bayes: 0.738462 
SVM: 0.815385 
Decision Tree: 0.769231 
Meta-Classifier: 0.815385 
Random Forest: 0.861538 
Bagging: 0.769231 
AdaBoost: 0.753846 
GradientBoost: 0.815385 

Conclusion:

Ultimate Goal is to classify the patients into the Parkinsons Disease Affected and Non-affected label using the attributes from their voice recordings Dataset.

Machine Learning algorithms like Standard Classification and Ensemble Models applied to predict accurately for diagnosis Parkinsons Disease, this would be an effective screening step prior to an appointment with a clinician.

Hence from the above model comparison,"Random Forest " model is the Best model score since it provides higher accuracy score through cross validation. Random forest classifier will handle the missing values ,maintain the accuracy of a large proportion of data and are a popular method for feature ranking as it select good features to predict the Target class accurately.